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Abstract 

We  present  an  efficient  computational  approach  to  perform  real-space  electronic  struc¬ 
ture  calculations  using  an  adaptive  higher-order  finite-element  discretization  of  Kohn- 
Sham  density-functional  theory  (DFT).  To  this  end,  we  develop  an  a  priori  mesh  adaption 
technique  to  construct  a  close  to  optimal  finite-element  discretization  of  the  problem.  We 
further  propose  an  efficient  solution  strategy  for  solving  the  discrete  eigenvalue  problem 
by  using  spectral  finite-elements  in  conjunction  with  Gauss-Lobatto  quadrature,  and  a 
Chebyshev  acceleration  technique  for  computing  the  occupied  eigenspace.  Using  the  pro¬ 
posed  solution  procedure,  we  investigate  the  computational  efficiency  afforded  by  higher- 
order  finite-element  discretizations  of  the  Kohn-Sham  DFT  problem.  Our  studies  suggest 
that  staggering  computational  savings — of  the  order  of  1000— fold — can  be  realized,  for 
both  all-electron  and  pseudopotential  calculations,  by  using  higher-order  finite-element 
discretizations.  On  all  the  benchmark  systems  studied,  we  observe  diminishing  returns  in 
computational  savings  beyond  the  sixth-order  for  accuracies  commensurate  with  chemi¬ 
cal  accuracy,  suggesting  that  the  hexic  spectral-element  may  be  an  optimal  choice  for  the 
finite-element  discretization  of  the  Kohn-Sham  DFT  problem.  A  comparative  study  of 
the  computational  efficiency  of  the  proposed  higher-order  finite-element  discretizations 
suggests  that  the  performance  of  finite-element  basis  is  competing  with  the  plane-wave 
discretization  for  non-periodic  pseudopotential  calculations,  and  is  comparable  to  the 
Gaussian  basis  for  all-electron  calculations.  Further,  we  demonstrate  the  capability  of 
the  proposed  approach  to  compute  the  electronic  structure  of  materials  systems  contain¬ 
ing  a  few  thousand  atoms  using  modest  computational  resources,  and  good  scalability  of 
the  present  implementation  up  to  a  few  hundred  processors. 
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1  Introduction 

Electronic  structure  calculations  have  played  a  significant  role  in  the  investigation  of  ma¬ 
terials  properties  over  the  past  few  decades.  In  particular,  the  Kohn-Sham  approach  to 
density  functional  theory  (DFT)  [1]  has  made  quantum- mechanically  informed  calcula¬ 
tions  on  ground-state  materials  properties  computationally  tractable,  and  has  provided 
many  important  insights  into  a  wide  range  of  materials  properties.  The  Kohn-Sham 
approach  is  based  on  the  key  result  of  Hohenberg  &  Kohn  [2]  that  the  ground-state  prop¬ 
erties  of  a  materials  system  can  be  described  by  a  functional  of  electron  density.  Though, 
the  existence  of  an  energy  functional  has  been  established  by  the  Hohenberg-Kohn  result, 
its  functional  form  is  not  known  to  date.  The  work  of  Kohn  &  Sham  [1]  addressed  this 
challenge  in  an  approximate  sense,  and  has  laid  the  foundations  for  the  practical  appli¬ 
cation  of  DFT  to  materials  systems.  The  Kohn-Sham  approach  reduces  the  many-body 
problem  of  interacting  electrons  into  an  equivalent  problem  of  non-interacting  electrons 
in  an  effective  mean  field  that  is  governed  by  the  electron  density.  This  effective  single¬ 
electron  description  is  exact  in  principle  for  ground-state  properties,  but  is  formulated  in 
terms  of  an  unknown  exchange-correlation  term  that  includes  the  quantum-mechanical 
interactions  between  electrons.  This  exchange-correlation  term  is  approximated  using 
various  models — commonly  modeled  as  an  explicit  functional  of  electron  density — and 
these  models  have  been  shown  to  predict  a  wide  range  of  materials  properties  across 
various  materials  systems.  We  note  that  the  development  of  increasingly  accurate  and 
computationally  tractable  exchange-correlation  functionals  is  an  active  research  area  in 
electronic  structure  calculations.  Though  the  Kohn-Sham  approach  greatly  reduces  the 
computational  complexity  of  the  original  many-body  Schrodinger  problem,  simulations 
of  large-scale  material  systems  with  DFT  are  still  computationally  very  demanding.  Nu¬ 
merical  algorithms  which  are  robust,  computationally  efficient  and  scalable  on  parallel 
computing  platforms  are  always  desirable  to  enable  DFT  calculations  at  larger  length 
and  time  scales,  and  on  more  complex  systems,  than  possible  heretofore. 

The  plane-wave  basis  has  traditionally  been  one  of  the  popular  basis  sets  used  for 
solving  the  Kohn-Sham  problem  [3,  4,  5].  The  plane- wave  basis  allows  for  an  efficient 
computation  of  the  electrostatic  interactions  that  are  extended  in  real-space  through 
Fourier  transforms.  However,  the  plane-wave  basis  also  has  some  notable  disadvantages. 
In  particular,  calculations  are  restricted  to  periodic  geometries  that  are  incompatible 
with  most  realistic  systems  containing  defects  (for  e.g.  dislocations).  Further,  the  plane- 
wave  basis  provides  a  uniform  spatial  resolution  which  can  be  inefficient  in  the  treat¬ 
ment  of  non-periodic  systems  like  molecules,  nano-clusters  etc.,  or  materials  systems 
with  defects,  where  higher  basis  resolution  is  often  required  in  some  spatial  regions  and 
a  coarser  resolution  suffices  elsewhere.  Moreover,  the  plane-wave  basis  is  non-local  in 
real  space,  which  significantly  affects  the  scalability  of  computations  on  parallel  com¬ 
puting  platforms.  Thus,  the  development  of  real-space  techniques  for  electronic  struc¬ 
ture  calculations  has  received  significant  attention  over  the  past  decade,  and  we  refer 
to  [6,  7,  8,  9,  10,  11,  12,  13]  and  references  therein  for  a  comprehensive  overview.  Among 
the  real-space  techniques,  the  finite-element  basis  presents  some  key  advantages — it  is 
amenable  to  unstructured  coarse-graining,  allows  for  consideration  of  complex  geome¬ 
tries  and  boundary  conditions,  and  is  scalable  on  parallel  computing  platforms.  We  refer 
to  [14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28],  and  references  therein,  for  a 
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comprehensive  overview  of  the  past  efforts  in  developing  real-space  electronic  structure 
calculations  based  on  a  finite-element  discretization. 

While  the  finite-element  basis  is  more  versatile  than  the  plane- wave  basis  [14,  18],  it  is 
not  without  its  shortcomings.  Prior  investigations  have  shown  that  linear  finite-elements 
require  a  large  number  of  basis  functions — of  the  order  of  100,  000  basis  functions  per 
atom — to  achieve  chemical  accuracy  in  electronic  structure  calculations  (cf.  e.g.  [22,  29]), 
and  this  compares  very  poorly  with  plane-wave  basis  or  other  real-space  basis  functions. 
It  has  been  demonstrated  that  higher-order  finite-element  discretizations  can  alleviate 
this  degree  of  freedom  disadvantage  of  linear  finite-elements  in  electronic  structure  cal¬ 
culations  [23,  29,  30].  However,  the  use  of  higher-order  elements  increases  the  per  basis- 
function  computational  cost  due  to  the  need  for  higher-order  accurate  numerical  quadra¬ 
ture  rules.  Furthermore,  the  bandwidth  of  the  matrix  increases  cubically  with  the  order 
of  the  finite-element,  which  in  turn  increases  the  computational  cost  of  matrix- vector 
products.  In  addition,  since  a  finite-element  basis  is  non-orthogonal,  the  discretization  of 
the  Kohn-Sham  DFT  problem  results  in  a  generalized  eigenvalue  problem,  which  is  more 
expensive  to  solve  in  comparison  to  a  standard  eigenvalue  problem  resulting  from  using 
an  orthogonal  basis  (for  e.g.  plane- wave  basis).  Thus,  the  computational  efficiency  af¬ 
forded  by  using  a  finite-element  basis  in  electronic  structure  calculations,  and  its  relative 
performance  compared  to  plane-wave  basis  and  other  real-space  basis  functions  (for  e.g 
Gaussian  basis),  has  remained  an  open  question  to  date. 

A  recent  investigation  in  the  context  of  orbital-free  DFT  has  indicated  that  the  use  of 
higher-order  finite-elements  can  significantly  improve  the  computational  efficiency  of  the 
calculations  [31].  For  instance,  a  100  —  1000  fold  computational  advantage  has  been  re¬ 
ported  by  using  a  fourth-order  finite-element  in  comparison  to  a  linear  finite-element.  In 
the  present  work,  we  extend  this  investigation  to  study  the  Kohn-Sham  DFT  problem  and 
attempt  to  establish  the  computational  efficiency  afforded  by  higher-order  finite-element 
discretizations  in  electronic  structure  calculations.  To  this  end,  we  develop:  (i)  an  a 
priori  mesh  adaption  technique  to  construct  a  close  to  optimal  finite-element  discretiza¬ 
tion  of  the  problem;  (ii)  an  efficient  solution  strategy  for  solving  the  discrete  eigenvalue 
problem  by  using  spectral  finite-elements  in  conjunction  with  Gauss-Lobatto  quadrature, 
and  a  Chebyshev  acceleration  technique  for  computing  the  occupied  eigenspace.  We  sub¬ 
sequently  study  the  numerical  aspects  of  the  finite-element  discretization  of  the  formu¬ 
lation,  investigate  the  computational  efficiency  afforded  by  higher-order  finite-elements, 
and  compare  the  performance  of  the  finite-element  basis  with  plane-wave  and  Gaussian 
basis  on  benchmark  problems. 

The  a  priori  mesh  adaption  technique  proposed  in  this  work  is  based  on  the  ideas 
in  [32,  33],  and  closely  follows  the  recent  development  of  the  mesh  adaption  technique 
for  orbital-free  DFT  [31].  We  refer  to  [26,  27]  for  recently  proposed  a  posteriori  mesh 
adaption  techniques  in  electronic  structure  calculations.  The  mesh  adaption  technique 
proposed  in  the  present  work  is  based  on  minimizing  the  discretization  error  in  the  ground- 
state  energy,  subject  to  a  fixed  number  of  elements  in  the  finite-element  mesh.  To  this 
end,  we  first  develop  an  estimate  for  the  finite-element  discretization  error  in  the  Kohn- 
Sham  ground-state  energy  as  a  function  of  the  characteristic  mesh-size  distribution,  h{ r), 
and  the  exact  ground-state  electronic  fields  comprising  of  wavefunctions  and  electrostatic 
potential.  We  subsequently  determine  the  optimal  mesh  distribution  by  determining  the 
h{ r)  that  minimizes  the  discretization  error.  The  resulting  expressions  for  the  optimal 
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mesh  distribution  are  in  terms  of  the  degree  of  the  interpolating  polynomial  and  the 
exact  solution  fields  of  the  Kohn-Sham  DFT  problem.  Since  the  exact  solution  fields  are 
a  priori  unknown,  we  use  the  asymptotic  behavior  of  the  atomic  wavefunctions  [34]  away 
from  the  nuclei  to  determine  the  coarse-graining  rates  for  the  finite-element  meshes  used 
in  our  numerical  study.  Though  the  resulting  finite-element  meshes  are  not  necessarily 
optimal  near  the  vicinity  of  the  nuclei,  the  mesh  coarsening  rate  away  from  the  nuclei 
provides  an  efficient  way  of  resolving  the  vacuum  in  non-periodic  calculations. 

We  next  implement  an  efficient  solution  strategy  for  solving  the  finite-element  dis¬ 
cretized  eigenvalue  problem,  which  is  crucial  before  assessing  the  computational  efficiency 
of  the  basis.  We  note  that  the  non-orthogonality  of  the  finite-element  basis  results  in  a 
discrete  generalized  eigenvalue  problem,  which  is  computationally  more  expensive  than 
the  standard  eigenvalue  problem  that  results  from  using  an  orthogonal  basis  like  plane- 
waves.  We  address  this  issue  by  employing  a  spectral  finite-element  discretization  and 
Gauss-Lobatto  quadrature  rules  to  evaluate  the  integrals  which  results  in  a  diagonal 
overlap  matrix,  and  allows  for  a  trivial  transformation  to  a  standard  eigenvalue  problem. 
Further,  we  use  the  Chebyshev  acceleration  technique  for  standard  eigenvalue  problems  to 
efficiently  compute  the  occupied  eigenspace  (cf.  e.g.  [35]  in  the  context  of  electronic  struc¬ 
ture  calculations).  Our  investigations  suggest  that  the  use  of  spectral  finite-elements  and 
Gauss-Lobatto  rules  in  conjunction  with  Chebyshev  acceleration  techniques  to  compute 
the  eigenspace  gives  a  10  —  20  fold  computational  advantage,  even  for  modest  materials 
system  sizes,  in  comparison  to  traditional  methods  of  solving  the  standard  eigenvalue 
problem  where  the  eigenvectors  are  computed  explicitly.  Further,  the  proposed  approach 
has  been  observed  to  provide  a  staggering  100  —  200  fold  computational  advantage  over 
the  solution  of  a  generalized  eigenvalue  problem  that  does  not  take  advantage  of  the 
spectral  finite-element  discretization  and  Gauss-Lobatto  quadrature  rules.  In  our  imple¬ 
mentation,  we  use  a  self-consistent  field  (SCF)  iteration  with  Anderson  mixing  [36],  and 
employ  the  finite-temperature  Fermi-Dirac  smearing  [3]  to  suppress  the  charge  sloshing 
associated  with  degenerate  or  close  to  degenerate  eigenstates  around  the  Fermi  energy. 

We  next  study  various  numerical  aspects  of  the  finite-element  discretization  of  the 
Kohn-Sham  DFT  problem.  We  begin  our  investigation  by  examining  the  numerical  rates 
of  convergence  of  higher-order  finite-element  discretizations  of  Kohn-Sham  DFT.  We 
remark  here  that  optimal  rates  of  convergence  have  been  demonstrated  for  quadratic 
hexahedral  and  cubic  serendepity  elements  in  pseudopotential  Kohn-Sham  DFT  calcula¬ 
tions  [20,  26],  and  mathematically  proved  for  Kohn-Sham  DFT  for  the  case  of  smooth 
pseudopotential  external  fields  [37].  We  also  note  that  there  have  been  several  works  on 
the  mathematical  analysis  of  optimal  rates  of  convergence  for  non-linear  eigenvalue  prob¬ 
lems  [38,  39,  40].  However,  the  mathematical  analysis  of  optimal  rates  of  convergence  of 
the  finite-element  discretization  of  Kohn-Sham  DFT  problem  involving  Coulomb-singular 
potentials  is  an  open  question  to  date,  to  the  best  of  our  knowledge.  In  the  present 
study,  we  compute  the  rates  of  convergence  of  the  finite-element  discretization  of  Kohn- 
Sham  DFT  for  a  range  of  finite-elements  including  linear  tetrahedral  element,  hexahedral 
spectral-elements  of  order  two,  four  and  six.  Two  sets  of  benchmark  problems  are  con¬ 
sidered  in  this  study:  (i)  all-electron  calculations  on  boron  atom  and  methane  molecule; 
(ii)  pseudopotential  calculations  on  a  non-periodic  barium  cluster  consisting  of  2  x  2  x  2 
body-centered  cubic  (BCC)  unit  cells  and  a  periodic  face-centered  cubic  (FCC)  calcium 
crystal.  In  these  benchmark  studies,  as  well  as  those  to  follow,  the  proposed  a  priori 
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mesh  adaption  scheme  is  used  to  construct  the  meshes.  These  studies  show  rates  of  con¬ 
vergence  in  energy  of  0(h2k )  for  a  finite-element  whose  degree  of  interpolation  is  k,  which 
denote  optimal  rates  of  convergence  as  demonstrated  in  [20,  37].  An  interesting  aspect  of 
this  study  is  that  optimal  rates  of  convergence  have  been  observed  even  for  all-electron 
calculations  involving  Coulomb-singular  potentials,  which,  to  the  best  of  our  knowledge, 
have  not  been  analyzed  or  reported  heretofore. 

We  finally  turn  towards  assessing  the  computational  efficiency  afforded  by  higher- 
order  finite-element  discretizations  in  Kohn-Sham  DFT  calculations.  To  this  end,  we  use 
the  same  benchmark  problems  and  measure  the  CPU-time  for  the  solution  of  the  Kohn- 
Sham  DFT  problem  to  various  relative  accuracies  for  all  the  finite-elements  considered 
in  the  present  study.  We  observe  that  higher-order  elements  can  provide  a  significant 
computational  advantage  in  the  regime  of  chemical  accuracy.  The  computational  savings 
observed  are  about  1000-fold  upon  using  higher-order  elements  in  comparison  with  a 
linear  finite-element  for  both  all-electron  as  well  as  pseudopotential  calculations.  We 
observe  that  a  point  of  diminishing  returns  is  reached  at  the  sixth-order  for  accuracies 
commensurate  with  the  chemical  accuracy,  where  the  degree  of  freedom  advantage  of 
higher-order  finite-elements  is  nullified  by  the  increasing  per  basis-function  costs.  To 
further  assess  the  effectiveness  of  higher-order  finite-elements,  we  conduct  pseudopotential 
calculations  on  large  aluminium  clusters  ranging  from  3x3x3  to  5x5x5  FCC  unit 
cells  using  the  hexic  spectral  finite-element,  and  compare  the  computational  times  with 
that  of  plane- wave  basis  discretization  using  ABINIT  package  [5].  We  note  that  similar 
relative  accuracies  in  the  ground-state  energies  are  achieved  using  the  hexic  finite-element 
with  a  lower  computational  cost  in  comparison  to  the  plane-wave  basis.  Furthermore, 
we  computed  the  electronic  structure  of  an  aluminum  cluster  of  7  x  7  x  7  FCC  unit 
cells,  containing  1689  atoms,  with  the  finite-element  basis  using  modest  computational 
resources,  which  could  not  be  simulated  in  ABINIT  due  to  huge  memory  requirements. 

We  also  investigate  the  efficiency  of  higher-order  elements  in  the  case  of  all-electron 
calculations  on  a  larger  materials  system  and  compare  it  with  the  Gaussian  basis  using 
the  GAUSSIAN  package  [41].  In  this  case,  the  benchmark  system  is  a  graphene  sheet 
containing  100  atoms.  We  find  that  the  solution  time  using  the  finite-element  basis  is 
larger  by  a  factor  of  10  in  comparison  to  Gaussian  basis,  and  we  attribute  this  differ¬ 
ence  to  the  highly  optimized  Gaussian  basis  functions  for  C-C  bonds  that  resulted  in 
the  far  fewer  basis  functions  required  to  achieve  chemical  accuracy.  While  this  difference 
in  the  performance  can  be  offset  by  the  better  scalability  of  finite-element  discretiza¬ 
tion  on  parallel  computing  platforms,  there  is  also  much  room  for  further  development 
and  optimization  in  the  finite-element  discretization  of  the  Kohn-Sham  DFT  problem. 
For  instance,  especially  in  the  context  of  all-electron  calculations,  the  partitions-of-unity 
finite-element  method  with  atomic-orbital  enrichment  functions  can  significantly  reduce 
the  required  number  of  finite-element  basis  functions  as  recently  demonstrated  in  [42], 
and  presents  an  important  direction  for  further  investigations.  Finally,  we  assess  the 
parallel  scalability  of  our  numerical  implementation.  We  demonstrate  the  strong  scaling 
up  to  192  processors  (limited  by  our  access  to  computing  resources)  with  an  efficiency 
of  91.4%  using  a  172  atom  aluminium  cluster  discretized  with  3.91  million  degrees  of 
freedom  as  our  benchmark  system. 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  describes  the  vari¬ 
ational  formulation  of  the  Kohn-Sham  DFT  problem  followed  by  a  discussion  on  the 
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discrete  Kohn-Sham  DFT  eigenvalue  problem.  Section  3  develops  the  error  estimates  for 
the  finite-element  discretization  of  Kohn-Sham  DFT,  and  uses  these  estimates  to  present 
an  a  priori  mesh  adaption  scheme.  Section  4  describes  our  numerical  implementation 
of  the  self-consistent  field  iteration  of  the  Kohn-Sham  eigenvalue  problem,  and,  in  par¬ 
ticular,  discusses  the  efficient  methodologies  developed  to  solve  the  Kohn-Sham  DFT 
problem  using  the  finite-element  basis.  Section  5  presents  a  comprehensive  numerical 
study  demonstrating  the  computational  efficiency  afforded  by  higher-order  finite-element 
discretizations  in  electronic  structure  calculations,  and  also  provides  a  performance  com¬ 
parison  of  finite-element  basis  with  plane-wave  and  Gaussian  basis.  We  finally  conclude 
with  a  short  discussion  and  outlook  in  Section  6. 


2  Formulation 

In  this  section,  we  describe  the  Kohn-Sham  DFT  energy  functional  and  the  associated 
variational  formulation.  We  subsequently  review  the  equivalent  self-consistent  formula¬ 
tion  of  the  Kohn-Sham  eigenvalue  problem,  and  present  the  discretization  of  the  formu¬ 
lation  using  a  finite-element  basis. 


2.1  Kohn-Sham  variational  problem 

We  consider  a  material  system  consisting  of  N  electrons  and  M  nuclei.  The  spinless 
Kohn-Sham  energy  functional  describing  the  N  electron  system  is  given  by  [43,  44] 

E{^f ,  R )  =  Ts(pi>)  +  Exc(p)  +  Eh{p)  +  Eext(p,  R)  +  EZZ(R),  (1) 


where 


N 

P(r)  =X^(x)|2 

i—  1 


(2) 


represents  the  electron  density.  In  the  above  expression,  we  denote  the  spatial  coordi¬ 
nate  by  r,  whereas  x  =  (r,  s)  includes  both  the  spatial  and  spin  degrees  of  freedom. 
We  denote  by  =  {^(x),  ^(x),  •  •  •  ,  ^jv(x)}  the  vector  of  orthonormal  single  electron 
wavefunctions,  where  each  wavefunction  ipi  €  X  x  {a,  /?}  can  in  general  be  complex¬ 
valued,  and  comprises  of  a  spatial  part  belonging  to  a  suitable  function  space  X  (elab¬ 
orated  subsequently)  and  a  spin  state  denoted  by  a(s)  or  /3(s).  We  further  denote  by 
R  =  {R\ ,  R‘2 ,  ■  ■  ■  Rm}  the  collection  of  all  nuclear  positions.  The  first  term  in  the 
Kohn-Sham  energy  functional  in  (1),  Ts(\l/),  denotes  the  kinetic  energy  of  non-interacting 
electrons  and  is  given  by 


Ts(^)  =  Ylj  $f(x)  (_^v2)  ^(x)dx »  (3) 

where  ijj*  denotes  the  complex  conjugate  of  t/’j.  Exc  in  the  energy  functional  denotes  the 
exchange-correlation  energy  which  includes  the  quantum-mechanical  many  body  inter¬ 
actions.  In  the  present  work,  we  model  the  exchange-correlation  energy  using  the  local 
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density  approximation  (LDA)  [45,  46]  represented  as 


EXc(p) 


J  £xc(p)p( r)  dr  , 


where  exc(p )  =  ex{p)  +  ec(p),  and 


£x(p)  = 


3 

4 


(4) 


(5) 


£c(p)  = 


(l+/9l y/irs)+fhra) 

A  logrs  +  B  +  Crs  logrs  +  D  rs  rs  <  1, 


(6) 


and  rs  =  (3/47T p)1/3.  Specifically,  we  use  the  Ceperley  and  Alder  constants  as  given 
in  [46].  We  remark  that  we  have  restricted  the  present  formulation  and  study  to  LDA 
exchange-correlation  functionals  solely  for  the  sake  of  clarity,  and  the  formulation  can 
be  trivially  extended  (cf.  e.g.  [24])  to  local  spin  density  approximation  (LSDA)  and 
generalized  gradient  approximation  (GGA)  exchange-correlation  functionals. 

The  electrostatic  interaction  energies  in  the  Kohn-Sham  energy  functional  in  (1)  are 
given  by 


r  —  r 


(7) 


Eext  (p>  di) 


p{r)Vext(r ,  R)  dr 


(8) 


E 


zz 


I,J±I 


ZiZj 
Ri  -  Rj 


(9) 


where  Eh  is  the  Hartree  energy  representing  the  classical  electrostatic  interaction  energy 
between  electrons,  Eext  is  the  interaction  energy  between  electrons  and  the  external 
potential  induced  by  the  nuclear  charges  given  by  Vext  =  ]C  j  Vj(r,  Rj )  with  Vj  denoting 
the  potential  (singular  Coulomb  potential  or  pseudopotential)  contribution  from  the  Jth 
nucleus,  and  Ezz  denotes  the  repulsive  energy  between  nuclei  with  Zj  denoting  the  charge 
on  the  Ith  nucleus.  We  note  that  in  a  non-periodic  setting,  representing  a  finite  atomic 
system,  all  the  integrals  in  equations  (3)-(8)  are  over  M3  and  the  summations  in  (8)-(9) 
include  all  the  atoms  I  and  J  in  the  system.  In  the  case  of  an  infinite  periodic  crystal,  all 
the  integrals  over  r  in  equations  (3)-(8)  extend  over  the  unit  cell,  whereas  the  integrals 
over  r'  extend  in  M3.  Similarly,  in  (8)- (9)  the  summation  over  /  is  on  the  atoms  in  the 
unit  cell,  and  summation  over  J  extends  over  all  lattice  sites.  We  note  that,  in  the  context 
of  periodic  problems,  the  above  expressions  assume  a  single  k-point  (T— point)  sampling. 
The  computation  of  the  electron  density  and  kinetic  energy  in  (2)  and  (3)  for  multiple 
k-point  sampling  involves  an  additional  quadrature  over  the  k-points  in  the  Brillouin  zone 
(cf.  e.g  [47]). 

The  electrostatic  interaction  terms  as  expressed  in  equations  (7)-(9)  are  nonlocal  in 
real-space,  and,  for  this  reason,  evaluation  of  electrostatic  energy  is  the  computationally 
expensive  part  of  the  calculation.  Following  the  approach  in  [48,  24],  the  electrostatic 
interaction  energy  can  be  reformulated  as  a  local  variational  problem  in  electrostatic 


7 


Motamarri,  Nowak,  Leiter,  Knap,  Sz  Gavini 


potential  by  observing  that  ^  is  the  Green’s  function  of  the  Laplace  operator.  To  this  end, 

M 

we  represent  the  nuclear  charge  distribution  by  b( r,  R)  =  —  Ejdn^r),  where  Zi5ri( r) 

1=1 

represents  a  bounded  smooth  charge  distribution  centered  at  Rj,  either  corresponding 
to  a  pseudopotential,  or,  in  the  case  of  all-electron  calculations,  a  regularization  of  the 
point  charge  having  a  support  in  a  small  ball  around  i?/  with  charge  Zj.  The  nuclear 
repulsion  energy  can  subsequently  be  represented  as 


dr  dr' . 


(10) 


We  remark  that,  while  this  differs  from  the  expression  in  equation  (9)  by  the  self-energy  of 
the  nuclei,  the  self-energy  is  an  inconsequential  constant  depending  only  on  the  nuclear 
charge  distribution,  and  is  explicitly  evaluated  and  subtracted  from  the  total  energy 
in  numerical  computations  (cf.  Appendix  C  in  [31]).  Subsequently,  the  electrostatic 
interaction  energy,  up  to  a  constant  self-energy,  is  given  by  the  following  variational 
problem: 


p(r)p(r’ 


r  —  r 


^  dr  dr'  +  /  p(r)Vext(r)  dr  +  \  J  f  dr  dr' 


iey  { i  /  lV<Kr)|2*-/ (p(r)  +  6(r,i?))0(r)dr|  , 


(11) 


where  </>(r)  denotes  the  trial  function  for  the  total  electrostatic  potential  due  to  the 
electron  density  and  the  nuclear  charge  distribution  and  y  is  a  suitable  function  space 
discussed  subsequently. 

Using  the  local  reformulation  of  electrostatic  interactions,  the  Kohn-Sham  energy 
functional  (1)  can  be  rewritten  as 


-E(Tr,  R)  =  sup  L((j),  Tr,  R)  ,  (12) 

<t>ey 

where 

*,R)  =  Ts(*)  +  Exc(p)  -±-J  |  V</>(r)|2  dr  +  J (p( r)  +  b(r,  R))<j>( r)  dr  .  (13) 

Subsequently,  the  problem  of  determining  the  ground-state  energy  and  electron  density 
for  given  positions  of  nuclei  can  be  expressed  as  the  following  variational  problem: 


inf  E(^F,R)  ,  (14) 

where  X  =  {\l/|  (ipi,i^j)xx{a,/3}  =  $ij}  with  (  ,  )xx{a,/3}  denoting  the  inner  product 
defined  on  X  x  X  denotes  a  suitable  function  space  that  guarantees  the  existence 

of  minimizers.  We  note  that  bounded  domains  are  used  in  numerical  computations, 
which  in  non-periodic  calculations  corresponds  to  a  large  enough  domain  containing  the 
compact  support  of  the  wavefunctions  and  in  periodic  calculations  correspond  to  the 
supercell.  We  denote  such  an  appropriate  bounded  domain,  subsequently,  by  U.  For 
formulations  on  bounded  domains,  X  =  y  =  Hq(Q)  in  the  case  of  non-periodic  problems 
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and  X  =  y  =  Hper(Q)  in  the  case  of  periodic  problems  are  appropriate  function  spaces 
which  guarantee  existence  of  solutions  (cf.  e.g.  [24]).  Mathematical  analysis  of  the  Kohn- 
Sham  DFT  problem  proving  the  existence  of  solutions  in  the  more  general  case  of  M3 
(X  =  f71(M3))  has  recently  been  reported  [49]. 

2.2  Kohn-Sham  eigenvalue  problem 

The  stationarity  condition  corresponding  to  the  Kohn-Sham  variational  problem  is  equiv¬ 
alent  to  the  non-linear  Kohn-Sham  eigenvalue  problem  given  by: 

Hi/ii  =  tiipi,  (15) 

where 

n=  ^V2  +  VeS(p,R))j  (16) 

is  a  Hermitian  operator  with  eigenvalues  e*,  and  the  corresponding  orthonormal  eigen¬ 
functions  i pi  for  i  =  1,  2,  ■  ■  ■  ,  N  denote  the  canonical  wavefunctions.  The  electron  density 
in  terms  of  the  canonical  wavefunctions  is  given  by 

N 

P(r)  =  E  \M*)\2  ,  (17) 

i— 1 

and  the  effective  single-electron  potential,  Vcs_{p,R),  in  (16)  is  given  by 

VeS{p,  R)  =  Vext(R )  +  VH(p)  +  Vxc{p)  =  Vext(R)  +  ^  ^  .  (18) 

op  op 

As  discussed  in  Section  2.1,  it  is  efficient  to  compute  the  total  electrostatic  potential, 
defined  as  the  sum  of  the  external  potential  (Ve xt(R))  and  the  Hartree  potential  ( Vh{p )), 
through  the  solution  of  a  Poisson  equation 

-^v2<^(r,  R)  =  p( r)  +  b{r,  R)  , 

which  is  given  by 

(j)(r,R)  =  VH{p) +  Vext(r,R)  =  f  \  dr  +  f  dr  .  (19) 

J  |r  —  r'|  J  |r  —  r'| 

Finally,  the  system  of  equations  corresponding  to  the  Kohn-Sham  eigenvalue  problem  are 
given  by 

^-^V2  +  VeS(p,  R)  \  fa  =  (20a) 

N 

p(r)  =  Yl  i^(x)i2>  (20b) 

%— l 

1  5E 

-  v2<Kr>  R)  =  p(r)  +  Kr>  R)  ;  Ves{p,  R)  =  </Hr,  R)  +  ,  (20c) 

which  have  to  be  solved  with  appropriate  boundary  conditions  based  on  the  problem 
under  consideration.  The  formulation  in  (20)  represents  a  nonlinear  eigenvalue  problem 
which  has  to  be  solved  self-consistently,  and  is  subsequently  discussed  in  Section  4.  Next 
we  turn  to  the  discrete  formulation  of  the  above  Kohn-Sham  eigenvalue  problem. 
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2.3  Discrete  Kohn-Sham  eigenvalue  problem 

If  Xh  represents  the  finite-dimensional  subspace  with  dimension  n^,  the  finite-element 
approximation  of  the  various  field  variables  (spatial  part  of  the  wavefunctions  and  the 
electrostatic  potential)  in  the  Kohn-Sham  eigenvalue  problem  (20)  are  given  by 


nh 


*l>i( r)  =  '52Nj(r)i%  , 

3= 1 

(21) 

nh 

0ft(r)  =  'YjNj(T)4?  , 

(22) 

j= 1 


where  Nj  :  1  <  j  <  n h  denote  the  basis  of  Xf,,.  Subsequently,  the  discrete  eigenvalue 
problem  corresponding  to  (20)  is  given  by 

H Vi  =  e?M^i  ,  (23) 

where  H jk  denotes  the  discrete  Hamiltonian  matrix,  M jk  denotes  the  overlap  matrix  (or 
commonly  referred  to  as  the  mass  matrix  in  finite-element  literature),  and  denotes  ith 
eigenvalue  corresponding  to  the  eigenvector  •Z',;.  The  expression  for  the  discrete  Hamil¬ 
tonian  matrix  H jk  for  a  non-periodic  problem  with  X  =  y  =  i7g(fl)  as  well  as  a  periodic 
problem  on  a  supercell  with  X  =  y  =  Hper(kl)  is  given  by 

H jk  =  \  f  VJVj(r)  -VJVj.(r)  dr  +  [  V^(v,  R)Nj(v)Nk{v)  dr  .  (24) 

1  Jn  J  n 

We  refer  to  [20]  for  the  expression  of  H in  the  case  of  a  periodic  problem  on  a  unit-cell 
using  the  Bloch  ansatz.  The  discrete  Kohn-Sham  eigenvalue  problem  (23)  is  a  generalized 
eigenvalue  problem  with  an  overlap  matrix  M =  f^Nj(r)Nk(r)dr,  which  results  from 
the  non-orthogonality  of  the  finite-element  basis  functions.  However,  the  generalized 
eigenvalue  problem  (23)  can  be  transformed  into  a  standard  Hermitian  eigenvalue  problem 
as  follows.  Since  the  matrix  M  is  positive  definite  symmetric,  there  exists  a  unique 
positive  definite  symmetric  square  root  of  M,  and  is  denoted  by  M1//2.  Hence,  the 


following  holds  true 

H 

=»  H =  e^M1/2M1/2^t 

=>  H  Vi  = 

(25) 

where 

Vi  =  M  1/2<Pri 

H  =  M_1/2HM_1/2 

_ 

We  note  that  H  is  a  Hermitian  matrix,  and  (25)  represents  a  standard  Hermitian  eigen¬ 
value  problem.  The  actual  eigenvectors  are  recovered  by  the  transformation  'k'i  = 
We  remark  that  S& ,  is  a  vector  containing  the  expansion  coefficients  of  the 
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eigenfunction  V’f(r)  expressed  in  an  orthonormal  basis  spanning  the  finite-element  space. 
Furthermore,  we  note  that  the  transformation  to  a  standard  eigenvalue  problem  (25)  is 
computationally  advantageous  only  if  the  matrix  M~1//2  can  be  evaluated  with  modest 
computational  cost.  This  is  readily  possible  by  using  spectral  finite-elements  rather  than 
conventional  finite-elements,  and  is  discussed  in  detail  in  Section  4. 

The  convergence  of  finite-element  approximation  for  the  Kohn-Sham  DFT  model  was 
shown  in  [24]  using  the  notion  of  T— convergence.  We  also  refer  to  the  recent  numerical 
analysis  carried  out  on  finite  dimensional  discretization  of  Kohn-Sham  models  [37],  which 
also  provides  the  rates  of  convergence  of  the  approximation.  Next,  we  derive  the  optimal 
coarse-graining  rates  for  the  finite-element  meshes  using  the  solution  fields  in  the  Kohn- 
Sham  DFT  problem. 


3  A -priori  mesh  adaption 

We  propose  an  a  priori  mesh  adaption  scheme  in  the  spirit  of  [33,  32]  by  minimizing  the 
error  involved  in  the  finite-element  approximation  of  the  Kohn-Sham  DFT  problem  for 
a  fixed  number  of  elements  in  the  mesh.  The  proposed  approach  closely  follows  the  a 
priori  mesh  adaption  scheme  developed  in  the  context  of  orbital- free  DFT  [31].  In  what 
follows,  we  first  derive  a  formal  bound  on  the  energy  error  \E  —  Eh\  as  a  function  of  the 
characteristic  mesh-size  h,  and  the  distribution  of  electronic  fields  (wavefunctions  and 
electrostatic  potential).  We  note  that,  in  a  recent  study,  error  estimates  for  a  generic 
finite  dimensional  approximation  of  the  Kohn-Sham  model  have  been  derived  [37] .  How¬ 
ever,  the  forms  of  these  estimates  are  not  useful  for  developing  mesh-adaption  schemes 
as  the  study  primarily  focused  on  proving  the  convergence  of  the  finite-dimensional  ap¬ 
proximation  and  determining  the  convergence  rates.  We  first  present  the  derivation  of 
an  error  bound  in  terms  of  the  canonical  wavefunctions  and  the  electrostatic  potential, 
and  subsequently  develop  an  a  priori  mesh  adaption  scheme  based  on  this  error  bound. 

3.1  Estimate  of  energy  error 

In  the  present  section  and  those  to  follow,  we  demonstrate  our  ideas  on  a  system  con¬ 
sisting  of  2N  electrons  for  the  sake  of  simplicity  and  notational  clarity.  Let  (vL  = 

{'F\  1^2  ■■■  Fn)  ,  4>h  ,  e*  =  {e^  ,  4  ■  •  ■  £%})  and  (#  =  {ipi ,  fa  •  •  •  * }n},  4>,e  =  {e1,e2  ■■■  e^}) 
represent  the  solutions  (spatial  part  of  canonical  wavefunctions,  electrostatic  potential, 
eigenvalues)  of  the  discrete  finite-element  problem  (23)  and  the  continuous  problem  (20) 
respectively.  In  the  following  derivation  and  henceforth  in  this  article,  we  consider  all 
wavefunctions  to  be  real-valued  and  orthonormal.  We  note  that  it  is  always  possible 
to  construct  real-valued  orthonormal  wavefunctions  for  both  non-periodic  problems  as 
well  as  periodic  problems  on  the  supercell.  The  wavefunctions  are  complex-valued  for 
periodic  problems  on  a  unit-cell  (with  multiple  k-points  using  the  Bloch  ansatz),  and  the 
following  approach  is  still  valid,  but  results  in  more  elaborate  expressions  for  the  error 
bounds.  Using  the  local  reformulation  of  electrostatic  interactions  in  the  Kohn-Sham 
energy  functional  (equations  (12)-(13)),  the  ground-state  energy  in  the  discrete  and  the 
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continuous  problem  can  be  expressed  as: 
TV 


Eh(*h,fi')  =  2j2  I  l-\V$\2dr+f  F(p(*h))dr-±-  f  \Vfih\2  dr+  f  (p(*h)  +  b)0h  dr 

■ j  i/  iT2  ^  • '  lT2  • '  iT2  w 


(26) 


£(#,<£)  =  2^  /  I|V^|2dr+  f  F(p(*))dr--i-  j  |V^|2dr  +  f  (p(#)  +  fe^dr, 

■ |  • '  ft  */  if2  •'  iT^  • '  -  t 

(27) 


where 


^(p)  =  e*c(p)p  • 


Proposition  3.1.  /n  tde  neighborhood  of  (^,cf>,e),  the  finite- element  approximation 
error  in  the  ground-state  energy  can  be  bounded  as  follows: 


Eh  -  E\  <  2  V'l'i  f  \\7 5fii\2  dr  +  a  [  (Sipf)2  dr 

i=i  dj? 

f  (8fii)24>  dr  +2  f  id  fii  6  <p  dr 

J  ft  J  ft 


+ 


+  8 


+ 


1 


IS 7 


+  ^  y  iwr* 


(28) 


/0*-w*»(e 


dr 


Proof.  We  first  expand  Eh(fi$>h ,  <f>h)  about  the  solution  of  the  continuous  problem,  i.e 


Jw-.h 


^  =  T'  +  and  <jr  =  <fi  +  &/>,  and  we  get 


N 

Eh(*  +  5*,$  +  54>)  =  2V  [  +  Sfiifdr  +  [  F  (p(*  +  5*))  dr 

~f  J Q  2  dj? 


-T  [  \V{fi  +  5fi)\2dr+  [  (p{4f  +  S^f)  +  b)(fi  +  8fi)dr, 
8vr  dfi 

(29) 

which  can  then  be  simplified,  using  the  Taylor  series  expansion,  to 
N 

Eh(*h ,4>h)  =  2^  /  h|V^j|2  +  \S78ifi\2  +  2Vfii  ■  VSiff)  dr  +  [  F(p(*))dr 


i=  1 


'i? 


iV 


TV 


TV 


+  4E  /  ^(pW)^^i*  +  8  /  F"(p(*))  dr  +  2V  /  p'(p(#))(<5^)2 

l=i^  ^  \*=i  /  i=i^ 

N 

-  T  J  (|V0|2  +  |Vd0|2  +  2V0  •  V^)  dr  +  J  (p(¥)  +  b)4>  dr  +  4^  J  ^  <5^  ^  dr 


TV 


TV 


+  f  (p(\l>)  +  6)<5</>  dr  +  2  V  f  {5f>i)2(j)  dr +  4^  f  t/’i  8fii  8<f  dr  +  0(Sipf,  dfifdfi) 
J  ft  i=\  ^  t?  i=  \  ^  t? 


(30) 


dr 
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1,  ■  ■  ■  ,N. 


We  note  that  (^,0,e)  satisfy  the  following  Euler-Lagrange  equations  for  each  i  = 

f 

J  r 


■  VSipi  dr  +  6^  dr  +  V’i  Hi  4>  dr  =  e,  /  tpi  Sipi  dr  ,  (31a) 

J  n  J  n 

~TZ  [  V<^  •  VH  dr+  [  (p(V)  +  b)5(f>  dr  =  0  .  (31b) 

ft  J  Q 


47 r 


Using  (30)  and  the  Euler-Lagrange  equations  (31),  we  get 

N  „ 


Eh  —  E  =  2  ^ 


i= 1 


n 


\N5i/.’i\2  +  2  Zifa  5k>i  +  F\pH))(6^) 


N 


dr  +  8  ^  F"(p( ¥))  f  £  ^  j  dr 


TV  r 

7-  I  I V «50|2  dr  +  2  [  (S^i)2(p  dr +  2  f 

87r  Ti?  LT«  Jr. 


dr  +  2  /  ifii  Sipi  5(f>  dr 
n 


+  0(5if)f ,  5i/j2  84>)  . 


The  orthonormality  constraint  functional  in  the  discrete  form  is  given  by 

cHh)  =  [  tfrfdr-Sij  , 

Jn 

and  upon  expanding  about  the  solution  \Dl  we  get 


c(T,/')  =  /  (ipi  +  +  S'lpj)  dr  -  S., 


in 


=  /  [ipi'ipj  +  +  dipjtpi  +  d'lpid'i/jj]  dr  - 

Jn 


Using 


/  i/iiipj  dr  =  5ij  , 
J  n 


and  c(\Df/l)  =  0  in  (35),  we  get  for  i  =  j 

2  [  JjiHi  dr  =  -  [  (S-ipi)2  dr  i  =  1,  2, . . . ,  N  . 

J  Q  J  Q 

Using  equations  (32)  and  (37),  we  arrive  at  the  following  error  bound  in  energy 

N 


(32) 

(33) 

(34) 

(35) 

(36) 

(37) 


\Eh-E\<2^[\J  \NHi\2  dr  - 

/ 

Jr 


f-i  /  (<%)2dr 


in 


+ 


+  8 


( Hi)2(t>  dr 


ipi  5ipi  5(j)  dr 


+ 


1 


F\p{H){Hi)2  dr 


in 


+  —  /  |V(50p  dr 

87r  Jn 


fnF"(p(*))  dr 


□ 
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Proposition  3.2.  The  finite- element  approximation  error  in  proposition  3.1  expressed  in 
terms  of  the  approximation  errors  in  electronic  wave-functions  and  electrostatic  potential 
is  given  by 


\Eh~E\  <  C  ^E  ||  fa  -  1$  H^n  +1^-  ^li.n  +  E  ||  ||o,n||  $  ~  4>h  lli.aj  (38) 

Proof.  We  use  the  following  norms:  |  •  |i  p  represents  the  semi-norm  in  H 1  space,  ||  •  ||pp 
denotes  the  H 1  norm,  ||  •  ||op  and  ||  •  ||o,p,o  denote  the  standard  L 2  and  Lp  norm 
respectively.  All  the  constants  to  appear  in  the  following  estimates  are  positive  and 
bounded.  Firstly,  we  note  that 


E  \  [  i  v^i2 *  <  ^  E  ^  -  $*ii,n > 

i  i 

El^l  f  Wi)2dr  =  Yj\ei\  [  (fii  -  iff  f  dr  <  C2  E  II  V’*  ~ 
Jn  Jn 


(39) 


■?  Ilia  •  (40) 


Using  Cauchy-Schwartz  and  Sobolev  inequalities,  we  arrive  at  the  following  estimate 


E 


in 


F\p(^))(8^)2  dv 


< 


E/ 

:  Jn 


dr 


CsE  II  *>(*))  llo, nil  llo, a 

i 

C'sEll  F'^))  llo, a ||  i>i  —  $  Ho, 4, a 


<  C3  E  II  IPi  ~ 


h  1 1 2 
i  lll,Q 


(41) 


Further,  we  note 


1 


|V(0  -fih)\2  dr  <C4\fi 


h\2 


8vr  jQ 

Using  Cauchy-Schwartz  and  Sobolev  inequalities  we  arrive  at 


E 


>n 


dr 


<  E  /  -  ^)2  ^  <  E 

■  v  • 


0  llo, all  (V’i  -  )2  llo,: 


(42) 


<  C5E  II  ^  ^ 


h  11 2 

i  llo, 4, a 


<  C5  E  II  V’i  -  i’i 


h  11 2 

i,a  • 


(43) 
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Also,  we  note  that 


E 


<n 


'i/’i  Sipi  5<j)  dr 


< 


E 


Q 


dr 


Mi’i  -  $)(<!>  -  4>h) 

<  E  II  i’i  ||o,6,n||  4>i  —  i’i  ||o,n||  4>  —  4>h  l|o,3,n 

i 

\\ -ipi  - -ipi  \\o, n\\  4>  -  (ph  \\ip  ,  (44) 


where  we  made  use  of  the  generalized  Holder  inequality  in  the  first  step  and  Sobolev 
inequality  in  the  next.  Finally,  we  use  Cauchy- Schwartz  inequality  to  arrive  at 

J V(pW)  ^><5^  dr  -  jn  If"(pW)I  (ei^I2)  (Ei^i2)  dr  (45) 

=  E  [  |*"(p(*))p(*)(W2|  dr 


<C7Y: 


(46) 

(47) 


Using  the  bounds  derived  above,  it  follows  that 


\Eh  —  E\  <  C  ||  ipi-ip?  ||?>n  +\<j>-  4>h\lp  +  E  II  ^  “  $  Ik nil  <P~  4>h  lli.nj  (48) 


□ 


We  now  bound  the  finite-element  discretization  error  with  interpolation  errors,  which 
in  turn  can  be  bounded  with  the  finite-element  mesh  size  h.  This  requires  a  careful 
analysis  in  the  case  of  Kohn-Sham  DFT  and  has  been  discussed  in  [37].  Using  the  results 
from  the  proof  of  Theorem  4.3  in  [37],  we  bound  the  estimates  in  equation  (48)  using  the 
following  inequalities  (cf.  [50]) 


1  'Pi  -  <  Col'pi  -  tpi  \i,n  <  CoE>^k+i.n.  > 

(49a) 

II  4>i  -  i’i  ||o,n<  Ci  ||  in  -  ipl  ]|0,n<  Ci  Y  hke+l |^|fc+i,ne  , 

e 

(49b) 

\4>  -  4>h\i,n  <C2\4>-  </>7|i,n  <C2£  hk\4>\k+ i,ne  , 

g 

(49c) 

II  4>  -  4>h  ||o,n<  C2H-  (p1  ||0)n<  C3  Y  he+1  Mk+i,ne  , 

e 

(49d) 

where  k  is  the  order  of  the  polynomial  interpolation,  and  e  denotes  an  element  in  the 
regular  family  of  finite-elements  [50]  with  mesh-size  he  covering  a  domain  kle.  Using  the 
above  estimates,  the  error  estimate  to  0(h2k+1 )  is  given  by 

\Eh  -  E  <  C  E  h2k  E  1^1  W +  1^11+1, ne  • 

(50) 

e  L  i 
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3.2  Optimal  coarse-graining  rate 

Following  the  approach  in  [33],  we  seek  to  determine  the  optimal  mesh-size  distribution 
by  minimizing  the  approximation  error  in  energy  for  a  fixed  number  of  elements.  Using 
the  definition  of  the  semi-norms,  we  rewrite  equation  (50)  as 

(51) 

where  Ne  denotes  the  total  number  of  elements  in  the  finite-element  triangulation,  and 
Dk+1  denotes  the  (k  +  l)th  derivative  of  any  function.  An  element  size  distribution 
function  h( r)  is  introduced  so  that  the  target  element  size  is  defined  at  all  points  r  in  Q, 
and  we  get 


I E, 


Ne 

-E\<CYj  [hf  /  [E  \Dk+1$i(r)\2  +  I  Dk+1$(r)f 


e=l 


dr 


Ne 

Eh~E\<Cj^  \h?\Y,\Dk+1 


Mr)\'2  +  \Dk+1m 


<  C  J  h2k{ r)  [E  \Dk+%(r)\2  +  \Dk+1 4>(r)f 


dr 

dr . 


(52) 

(53) 


Further,  the  number  of  elements  in  the  mesh  is  in  the  order  of 

dr 


NP_  oc 


/ 

Jn 


h3(r) 


(54) 


The  optimal  mesh-size  distribution  is  then  determined  by  the  following  variational  prob¬ 
lem  which  minimizes  the  approximation  error  in  energy  subject  to  a  fixed  number  of 
elements: 


mm 

h 


jQ{h2k[r)  [E  l^'+1^(r)|2  +  \Dk+1^)\2]  }  dr  subject  to  :  J ^  ^  =  Ne  .  (55) 


The  Euler-Lagrange  equation  associated  with  the  above  problem  is  given  by 


2kh2k-\r)  [E  \Dk+1Mr)\2  +  \Dk+1$(r)f 


3ri 

hH?) 


=  0. 


(56) 


where  rj  is  the  Lagrange  multiplier  associated  with  the  constraint.  Thus,  we  obtain  the 
following  distribution 

h(r)  =  ^(E  l^+1<Mr)|2  +  |JDfc+1^(r)|2)"1/(2fe+3)  ,  (57) 

i 

where  the  constant  A  is  computed  from  the  constraint  that  the  total  number  of  elements 
in  the  finite-element  discretization  is  Ne. 

The  coarse-graining  rate  derived  in  equation  (57)  has  been  employed  to  construct  the 
finite-element,  meshes  by  using  the  a  priori  knowledge  of  the  asymptotic  solutions  of  'tpi(r) 
and  (j)( r)  for  different  kinds  of  problems  we  study  in  the  subsequent  sections. 
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4  Numerical  implementation 

We  now  turn  to  the  numerical  implementation  of  the  discrete  formulation  of  the  Kohn- 
Sham  eigenvalue  problem  described  in  Section  2.  We  first  discuss  the  higher-order  finite- 
elements  used  in  our  study  with  specific  focus  on  spectral  finite-elements,  which  are 
important  in  developing  an  efficient  numerical  solution  procedure. 

4.1  Higher-order  finite-element  discretizations 

Linear  finite-element  basis  has  been  extensively  employed  for  a  wide  variety  of  appli¬ 
cations  in  engineering  involving  complex  geometries  and  moderate  levels  of  accuracy. 
On  the  other  hand,  much  higher  levels  of  accuracy  (chemical  accuracy)  is  desired  in 
electronic  structure  computations  of  materials  properties.  To  achieve  the  desired  chem¬ 
ical  accuracy,  a  linear  finite-element  basis  is  computationally  inefficient  since  it  requires 
a  large  number  of  basis  functions  per  atom  [29,  22],  Hence,  we  investigate  if  higher- 
order  finite-element  basis  functions  can  possibly  be  used  to  efficiently  achieve  the  desired 
chemical  accuracy.  To  this  end,  we  employ  in  our  study  C°  basis  functions  comprising  of 
linear  tetrahedral  element  (TET4)  and  spectral  hexahedral  elements  up  to  degree  eight 
(HEX27,  HEX125SPECT,  HEX343SPECT,  HEX729SPECT).  The  numbers  following 
the  words  ‘TET’  and  ‘HEX’  denote  the  number  of  nodes  in  the  element,  and  the  suffix 
‘SPECT’  denotes  that  the  element  is  a  spectral  finite-element.  We  note  that  spectral 
finite-elements  [51,  52]  have  been  employed  in  a  previous  work  in  electronic  structure  cal¬ 
culations  [30],  but  the  computational  efficiency  afforded  by  these  elements  has  not  been 
thoroughly  studied.  We  first  briefly  discuss  spectral  finite-elements  (also  referred  to  as 
spectral-elements)  employed  in  the  present  work  and  the  role  they  play  in  improving  the 
computational  efficiency  of  the  Kohn-Sham  DFT  eigenvalue  problem. 

The  spectral-element  basis  functions  employed  in  the  present  work  are  constructed 
as  Lagrange  polynomials  interpolated  through  an  optimal  distribution  of  nodes  corre¬ 
sponding  to  the  roots  of  derivatives  of  Legendre  polynomials,  unlike  conventional  finite- 
elements  which  use  equispaced  nodes  in  an  element.  Such  a  distribution  does  not  have 
nodes  on  the  boundaries  of  an  element,  and  hence  it  is  common  to  append  nodes  on  the 
element  boundaries  which  guarantees  C°  basis  functions.  These  set  of  nodes  are  usually 
referred  to  as  Gauss-Lobatto-Legendre  points.  Furthermore,  we  note  that  conventional 
finite-elements  result  in  a  poorly  conditioned  discretized  problem  for  a  high  order  of  in¬ 
terpolation,  whereas  spectral-elements  are  devoid  of  this  deficiency  [52],  The  improved 
conditioning  of  the  spectral-element  basis  was  observed  to  provide  a  2-3  fold  compu¬ 
tational  advantage  over  conventional  finite-elements  in  a  recent  benchmark  study  [31] 
conducted  to  assess  the  computational  efficiency  of  higher-order  elements  in  the  solution 
of  the  orbital-free  DFT  problem. 

A  significant  advantage  of  the  aforementioned  spectral-elements  is  realized  when  we 
conjoin  their  use  with  specialized  Gaussian  quadrature  rules  that  have  quadrature  points 
coincident  with  the  nodes  of  the  spectral-element,  which  in  the  present  case  corresponds 
to  the  Gauss-Lobatto-Legendre  (GLL)  quadrature  rule  [53].  Importantly,  the  use  of  such 
a  quadrature  rule  will  result  in  a  diagonal  overlap  matrix  (mass  matrix)  M.  To  elaborate, 
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consider  the  elemental  mass  matrix  Me  given  by 


m^ONji^C)  det(Je)  d£  drjdC 


nq 

=  ^  ]  'U,p,q,rNi(£lp,T]q,  (^GNj^p,  T]q,  G)  det(Je) 

p,q,r= 0 


(58) 

(59) 


where  (£,77,  £)  represents  the  barycentric  coordinates,  Je  represents  the  elemental  jaco- 
bian  matrix  of  an  element  kle,  and  nq  denotes  the  number  of  quadrature  points  in  each 
dimension  in  a  tensor  product  quadrature  rule.  Since  the  quadrature  points  are  coinci¬ 
dent  with  nodal  points,  the  above  expression  is  non-zero  only  if  i  =  j,  thus  resulting  in 
a  diagonal  elemental  mass  matrix  and  subsequently  a  diagonal  global  mass  matrix.  A 
diagonal  mass  matrix  makes  the  transformation  of  the  generalized  Kohn-Sham  eigenvalue 
problem  (23)  to  a  symmetric  standard  eigenvalue  problem  (25)  trivial.  As  discussed  and 
demonstrated  subsequently,  the  transformation  to  a  standard  eigenvalue  problem  allows 
us  to  use  efficient  solution  procedures  to  compute  the  eigenspace  in  the  self-consistent 
held  iteration.  We  note  that,  while  the  use  of  the  GLL  quadrature  rule  is  important 
in  efficiently  transforming  the  generalized  eigenvalue  problem  to  a  standard  eigenvalue 
problem,  this  quadrature  rule  is  less  accurate  in  comparison  to  Gauss  quadrature  rules. 
An  n  point  Gauss-Lobatto  rule  can  integrate  polynomials  exactly  up  to  degree  2 n  —  3, 
while  an  n  point  Gauss  quadrature  rule  can  integrate  polynomials  exactly  up  to  degree 
2n  —  1.  Thus,  in  the  present  work,  we  use  the  GLL  quadrature  rule  only  in  the  evaluation 
of  the  overlap  matrix,  while  using  the  more  accurate  Gauss  quadrature  rule  to  evaluate 
the  discrete  Hamiltonian  matrix  H. 


4.2  Self-consistent  field  iteration 

As  noted  in  Section  2,  the  Kohn-Sham  eigenvalue  problem  represents  a  nonlinear  eigen¬ 
value  problem  and  must  be  solved  self-consistently  to  compute  the  ground-state  electron 
density  and  energy.  We  use  computationally  efficient  schemes  to  evaluate  the  occupied 
eigenspace  of  the  Kohn-Sham  Hamiltonian  (discussed  below)  in  conjunction  with  finite 
temperature  Fermi-Dirac  distribution  and  charge  density  mixing  to  develop  an  efficient 
and  robust  solution  scheme  for  the  self-consistent  held  iteration  of  Kohn-Sham  problem. 


Algorithm  1  depicts  the  typical  steps  involved  in  the  self-consistent  held  (SCF)  iter¬ 
ation.  An  initial  guess  of  the  electron  density  held  is  used  to  start  the  computation.  A 
reasonable  choice  of  such  an  initial  guess  is  the  superposition  of  atomic  charge  densities, 
and  is  used  in  the  present  study.  The  input  charge  density  (pf„(r))  to  a  self-consistent 
iteration  is  used  to  compute  the  total  electrostatic  potential  0(r,  R )  by  solving  the  follow¬ 
ing  discrete  Poisson  equation  using  a  preconditioned  conjugate  gradient  method  provided 
by  PETSc  [54]  package: 


i  /*  r 

E  [4Z  /  VAriW  *]  4>k  =  /  +  b( r,  R))  N3(r)  dr  . 

k=l  ^  ^  ^  ^ 


(61) 


Subsequently,  the  effective  potential  Ves  is  evaluated  to  set  up  the  discrete  Kohn-Sham 
eigenvalue  problem  (23).  We  now  discuss  the  different  strategies  we  have  investigated  to 
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Algorithm  1  Self  Consistent  Field  Iteration 

1.  Provide  initial  guess  for  electron  density  pg(r)  011  the  finite-element  mesh.  This  will  be 
the  input  electron  density  for  the  first  self-consistent  iteration  (p^(r)  =  p(}( r)). 

2.  Compute  the  total  electrostatic  potential  (j)h(r,  R)  =  Vn(P[n( r))  +  Vext(b(r,  R))  by  solving 
the  discrete  Poisson  equation. 

3.  Compute  the  effective  potential,  Ceff(pfn,  R )  =  ^c(Pin)  +  0h(r,^)  • 

4.  Solve  for  the  occupied  subspace  spanned  by  the  eigenfunctions  Vy/l(r),  i  —  1,  2  •  •  •  N,  corre¬ 
sponding  to  N  (N  >  N/ 2)  smallest  eigenvalues  of  the  Kohn-Sham  eigenvalue  problem  (23). 

5.  Calculate  the  fractional  occupancy  factors  (/*)  using  the  Fermi-Dirac  distribution  (Sec¬ 
tion  (4.2.2)) 

6.  Compute  the  new  output  charge  densities  p^ut  from  the  eigenfunctions: 

PoutO)  =  25^/(cAf)  |^(r)|2,  (60) 


7.  If  ||Pout(r)  —  Phi(r)ll  —  tolerance,  stop ;  Else,  compute  new  pfn  using  a  mixing  scheme 
(Section  4.2.3)  and  go  to  step  2. 


compute  the  occupied  eigenspace  of  the  Kohn-Sham  Hamiltonian  H.  and  their  relative 
merits. 

4.2.1  Solver  strategies  for  finding  the  occupied  eigenspace 

We  examined  two  different  solution  strategies  to  compute  the  occupied  subspace:  (i)  ex¬ 
plicit  computation  of  eigenvectors  at  every  self-consistent  field  iteration;  (ii)  A  Chebyshev 
filtering  approach. 

4. 2. 1.1  Explicit  computation  of  eigenvectors:  We  first  discuss  the  methods 
examined  in  the  present  work  that  involve  an  explicit  computation  of  eigenvectors  at  a 
given  self-consistent  iteration.  We  recall  that  the  discrete  Kohn-Sham  eigenvalue  problem 
is  a  generalized  Hermitian  eigenvalue  problem  (GHEP)  (23).  As  mentioned  previously, 
by  using  the  GLL  quadrature  rules  for  the  evaluation  of  the  overlap  matrix  M,  which 
results  in  a  diagonal  overlap  matrix,  the  generalized  eigenvalue  problem  can  be  trivially 
transformed  into  a  standard  Hermitian  eigenvalue  problem  (SHEP).  We  have  explored 
both  approaches  in  the  present  work,  i.e.  (i)  solving  the  generalized  eigenvalue  problem 
employing  conventional  Gauss  quadrature  rules;  (ii)  solving  the  transformed  standard 
eigenvalue  problem  by  using  GLL  quadrature  rules  in  the  computation  of  overlap  matrix. 

We  have  employed  the  Jacobi-Davidson  (JD)  method  [55]  to  solve  the  GHEP.  The 
JD  method  falls  into  the  category  of  iterative  orthogonal  projection  methods  where  the 
matrix  is  orthogonally  projected  into  a  lower  dimensional  subspace  and  one  seeks  an 
approximate  eigenpair  of  the  original  problem  in  the  subspace.  The  basic  idea  in  JD 
method  is  to  arrive  at  better  approximations  to  eigenpairs  by  a  systematic  expansion  of 
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the  subspace  realized  by  solving  a  “Jacobi-Davidson  correction  equation”  that  involves 
the  solution  of  a  linear  system.  The  correction  equation  is  solved  only  approximately, 
and  this  approximate  solution  is  used  for  the  expansion  of  the  subspace.  Though  the 
JD  method  has  significant  advantages  in  computing  the  interior  eigenvalues  and  closely 
spaced  eigenvalues,  we  found  the  JD  method  to  be  computationally  expensive  for  systems 
involving  the  computation  of  eigenvectors  greater  than  50,  due  to  the  increase  in  the 
number  of  times  the  correction  equation  is  solved. 

On  the  other  hand,  we  employed  the  Krylov-Schur  (KS)  method  [56]  for  solving  the 
SHEP.  In  practice,  one  could  also  use  the  JD  method  to  solve  the  SHEP,  but,  as  previ¬ 
ously  mentioned,  the  JD  method  is  expensive  to  solve  systems  involving  few  hundreds 
of  electrons  and  beyond.  The  KS  method  can  be  viewed  as  an  improvement  over  tra¬ 
ditional  Krylov  subspace  methods  such  as  Arnoldi  and  Lanczos  methods  [57,  58].  The 
KS  method  is  based  on  Krylov-Schur  decomposition  where  the  Hessenberg  matrix  has 
the  Schur  form.  The  key  idea  of  the  KS  method  is  to  iteratively  construct  the  Krylov- 
subspace  using  Arnoldi  iteration  and  subsequently  filter  the  unwanted  spectrum  from 
the  Krylov-Schur  decomposition.  This  results  in  a  robust  restarting  scheme  with  faster 
convergence  in  most  cases. 

We  now  demonstrate  the  computational  efficiency  realized  by  solving  the  discrete 
Kohn-Sham  eigenvalue  problem  as  a  transformed  SHEP  in  comparison  to  GHEP.  To  this 
end,  we  consider  an  all-electron  simulation  of  a  graphene  sheet  containing  16  atoms  with 
96  electrons  (N  =  96)  and  a  pseudopotential  simulation  of  3  x  3  x  3  face-centered-cubic 
aluminum  nano-cluster  containing  172  atoms  with  516  electrons  (N  =  516)  as  benchmark 
systems.  The  relative  error  in  the  ground-state  energy  for  the  finite-element  mesh  used 
in  the  case  of  graphene  is  around  1.2  x  10-5  while  it  is  around  3.6  x  10~6  in  the  case  of 
aluminium  cluster.  The  reference  ground-state  energy  is  obtained  using  the  commercial 
code  GAUSSIAN  in  the  case  of  the  all-electron  simulation  of  the  graphene  system,  while  it 
is  obtained  using  the  convergence  study  presented  in  Section  5  for  the  aluminium  cluster. 
Table  1  shows  the  computational  time  taken  for  the  first  SCF  iteration  in  each  of  the 
above  cases.  All  the  times  reported  in  the  present  work  represent  the  total  CPU  times. 
The  Jacobi-Davidson  method  for  GHEP  and  Krylov-Schur  method  for  SHEP  provided 

Table  1:  Comparison  of  Generalized  vs  Standard  eigenvalue  problems 


Element  Type 

DOFs 

Problem  Type 

N 

Time  (GHEP) 

Time  (SHEP) 

HEX125SPECT 

1368801 

graphene 

96 

1786  CPU-hrs 

150  CPU-hrs 

HEX343SPECT 

2808385 

A1  3  x  3  x  3  cluster 

516 

2084  CPU-hrs 

80  CPU-hrs 

by  SLEPc  package  [59]  have  been  employed  in  the  present  study.  It  is  interesting  to  note 
that  a  10-fold  speedup  is  realized  by  transforming  the  Kohn-Sham  eigenvalue  problem 
to  a  SHEP  in  the  case  of  graphene,  while  a  25-fold  speedup  was  obtained  in  the  case  of 
aluminium  cluster.  We  note  that  a  similar  observation  was  recently  reported  in  [26]  where 
the  GHEP  was  transformed  to  SHEP  via  the  mass-lumping  approximation.  Further,  other 
simulations  conducted  as  part  of  the  present  study  suggest  that  this  speedup  increases 
with  increasing  system  size. 
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4. 2. 1.2  Chebyshev  filtering:  We  now  examine  the  alternate  approach  of  Cheby- 
shev  filtering  proposed  in  [35],  which  is  designed  to  iteratively  compute  the  occupied 
eigenspace  at  every  SCF  iteration.  We  note  that  the  Chebyshev  filtering  approach  is 
only  valid  for  standard  eigenvalue  problems.  To  this  end,  we  use  the  aforementioned 
approach  to  convert  the  GHEP  to  a  SHEP  by  employing  the  GLL  quadrature  rules  in 
computing  the  overlap  matrix,  and  remark  that  the  use  of  spectral  elements  in  conjunc¬ 
tion  with  the  GLL  quadrature  is  crucial  in  using  the  Chebyshev  filtering  technique  to 
solve  the  Kohn-Sham  eigenvalue  problem  in  a  finite-element  basis.  The  Chebyshev  fil¬ 
tering  approach  is  based  on  a  subspace  iteration  technique,  where  an  initial  subspace  is 
acted  upon  by  a  Chebyshev  filter  constructed  from  the  Kohn-Sham  Hamiltonian  that 
transforms  the  subspace  to  the  occupied  eigenspace. 

In  the  present  work,  at  any  given  SCF  iteration,  we  begin  with  the  initial  subspace  V 
formed  from  the  eigenvectors  of  the  previous  SCF  iteration.  We  note  that,  as  is  the  case 
with  all  subspace  iteration  techniques,  we  choose  the  dimension  of  the  subspace  V,  N, 
to  be  larger  than  the  number  of  filled  ground-state  orbitals.  Typically,  we  choose  N  ~ 
^  +  20.  This  is  also  necessary  to  employ  the  finite  temperature  Fermi-Dirac  smearing, 
discussed  in  Section  4.2.2,  to  stabilize  the  SCF  iterations  in  materials  systems  that  have 
very  small  band-gaps  or  have  degenerate  states  at  the  Fermi  energy.  As  proposed  in  [35], 
the  Chebyshev  filter  is  constructed  from  a  shifted  and  scaled  Hamiltonian,  H  =  ciH  +  c2, 
where  H  is  the  transformed  Hamiltonian  in  the  SHEP  (cf.  equation  (25)).  The  constants 
ci  and  C2  which  correspond  to  the  scaling  and  shifting  are  determined  such  that  the 
unwanted  eigen-spectrum  is  mapped  into  [—1, 1]  and  the  wanted  spectrum  into  (—00,  — 1). 
In  order  to  compute  these  constants,  we  need  estimates  of  the  upper  bounds  of  the 
wanted  and  unwanted  spectrums.  The  upper  bound  of  the  unwanted  spectrum,  which 
corresponds  to  the  largest  eigenvalue  of  H,  can  be  obtained  inexpensively  by  using  a  small 
number  of  iterations  of  the  Lanczos  algorithm.  The  upper  bound  of  the  wanted  spectrum 
is  chosen  as  largest  Rayleigh  quotient  of  H  in  the  space  V  from  the  previous  SCF  iteration. 
Subsequently,  the  degree- m  Chebyshev  filter,  pm( H),  which  magnifies  the  spectrum  of  H 
in  (—00,  —1)  -the  wanted  eigen-spectrum  of  H  —transforms  the  initial  subspace  V  to  the 
occupied  eigenspace  of  H.  The  degree  of  the  Chebyshev  filter  and  the  number  of  filters 
employed  are  chosen  such  that  the  obtained  space  is  a  close  approximation  of  the  occupied 
space,  with  the  residuals  in  the  eigenvalue  problem  below  a  prescribed  tolerance.  We 
note  that  the  action  of  the  Chebyshev  filter  on  V  can  be  performed  recursively,  similar 
to  the  recursive  construction  of  the  Chebyshev  polynomials  [60].  After  obtaining  the 
occupied  eigenspace,  we  orthogonalize  the  basis  functions,  and  subsequently  project  H 
into  the  eigenspace  to  compute  the  eigenvalues  that  are  used  in  the  Fermi-Dirac  smearing 
discussed  in  the  next  subsection. 

We  now  compare  the  computational  times  taken  for  a  single  SCF  iteration  solved 
using  an  eigenvalue  solver  based  on  Krylov-Schur  method  and  the  Chebyshev  filter  using 
the  aforementioned  benchmark  problems  comprising  of  a  16-atom  graphene  sheet  and 
172-atom  aluminium  cluster.  We  use  a  Chebyshev  polynomial  of  degree  800  for  the 
graphene  all-electron  calculation  and  a  polynomial  degree  of  12  for  aluminum  cluster 
pseudopotential  calculation  respectively.  We  remark  that  the  degree  of  the  polynomial 
required  for  the  Chebyshev  filter  depends  on  the  largest  eigenvalue  of  the  Hamiltonian  H, 
which  controls  the  ratio  of  the  wanted  to  the  unwanted  eigen-spectrum  and  determines  the 
effectiveness  of  a  Chebyshev  filter.  We  note  that  the  largest  eigenvalue  is  in  turn  related 
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Table  2:  Comparison  of  Standard  eigenvalue  problem  vs  Chebyshev  filtered  sub¬ 
space  iteration  (ChFSI) 


Element  Type 

DOFs 

Problem  Type 

N 

Time  (SHEP) 

Time  (ChFSI) 

HEX125SPECT 

1368801 

graphene 

96 

150  CPU-hrs 

12.5  CPU-hrs 

HEX343SPECT 

2808385 

A1  3  x  3  x  3  cluster 

512 

80  CPU-hrs 

13  CPU-hrs 

to  the  finite-element  discretization,  which  increases  with  decrease  in  the  element-size  of 
the  finite-element  mesh.  In  general,  all-electron  calculations  require  locally  refined  meshes 
near  the  nuclei  as  they  involve  Coulomb-singular  potential  fields  and  highly  oscillatory 
core  wavefunctions.  Hence,  a  very  high  degree  of  Chebyshev  polynomial  (of  the  order  of 
102  —  103)  needs  to  be  employed  to  effectively  filter  the  unwanted  spectrum.  On  the  other 
hand,  simulations  performed  on  systems  with  smooth  external  pseudopotential  require 
Chebyshev  polynomial  degrees  between  10  to  30.  As  is  evident  from  the  results,  we  clearly 
see  a  factor  of  12  speedup  that  is  obtained  in  the  case  of  graphene,  and  a  factor  of  around 
6  speedup  that  is  obtained  in  the  case  of  aluminium  cluster.  The  speedup  obtained  was 
even  greater  for  larger  materials  systems. 

The  use  of  spectral  finite-elements  in  conjunction  with  Chebyshev  filtered  subspace 
iteration  presents  an  efficient  and  robust  approach  to  solve  the  Kohn-Sham  problem  using 
the  finite-element  basis.  For  all  the  subsequent  simulations  reported  in  the  present  work, 
we  employ  the  Krylov-Schur  method  for  the  first  SCF  iteration  and  use  the  Chebyshev 
filtering  approach  for  all  subsequent  iterations  to  compute  the  occupied  eigenspace. 


4.2.2  Finite  temperature  smearing:  Fermi-Dirac  distribution 

For  materials  systems  with  very  small  band  gaps  or  those  with  degenerate  energy  levels 
at  the  Fermi  energy,  the  SCF  iteration  may  exhibit  charge  sloshing — a  phenomenon 
where  large  deviations  in  spatial  charge  distribution  are  observed  between  SCF  iterations 
with  different  degenerate  (or  close  to  degenerate)  levels  being  occupied  in  different  SCF 
iterations.  In  such  a  scenario,  the  SCF  exhibits  convergence  in  the  ground-state  energy, 
but  not  in  the  spatial  electron  density.  It  is  common  in  electronic  structure  calculations 
to  introduce  an  orbital  occupancy  factor  [3]  based  on  the  energy  levels  and  a  smearing 
function  to  remove  charge  sloshing  in  SCF  iterations.  A  common  choice  for  the  smearing 
function  is  the  finite  temperature  Fermi-Dirac  distribution,  and  the  orbital  occupancy 
factor  fj  corresponding  to  an  energy  level  e*  is  given  by 


fi  =  f(ti,eF) 


1 

l  +  exp(^^)  ’ 


(62) 


where  the  smearing  factor  a  =  ksT  with  ks  denoting  the  Boltzmann  constant  and  T 
denoting  the  temperature  in  Kelvin.  In  the  above  expression,  cf  denotes  the  Fermi 
energy,  which  is  computed  from  the  constraint  on  the  total  number  of  electrons  given  by 
Yi  2 fi  =  N.  We  note  that  the  convergence  of  ground-state  energy  is  quadratic  in  the 
smearing  parameter  a  [3]. 
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4.2.3  Mixing  scheme: 


The  convergence  of  the  SCF  iteration  is  crucially  dependent  on  the  mixing  scheme,  and 
many  past  works  in  the  development  of  electronic  structure  methods  have  focussed  on  this 
aspect  [61,  36,  62,  63,  64].  In  the  present  work,  we  employ  an  n-stage  Anderson  mixing 
scheme  [36],  which  is  briefly  described  below  for  the  sake  of  completeness.  Let  p^\r) 
and  (r)  represent  the  input  and  output  electron  densities  of  the  nth  self-consistent 
iteration.  The  input  to  the  (n  +  l)th  self-consistent  iteration,  p^+1>  (r),  is  computed  as 
follows 


P  in 


7mix  Pout  4~  (1  Tmix)  Pin 


(63) 


where 


rih 

^m(out) 


n—  1 

/iO)  , 

°n  ^in(out)  '  2—j  °k  ^in(out) 
k= 1 


(64) 


and  the  sum  of  all  the  constants  c*  is  equal  to  one,  i.e., 


Cl  +  C2  +  C3  +  ■  •  •  +  cn  —  1  . 

Using  the  above  constraint,  equation  (64)  can  be  written  as 


rih 

^in(out) 


n— 1 

hi71)  .  (  /?(n_fc) 

Pin(out)  +  °k  y  in(out) 
k= 1 


Oh(n)  ) 
^m(out)  J 


(65) 


(66) 


Denoting  F  =  pknt  —  phm ,  the  above  equation  can  be  written  as 

n—  1 

F  =  +  ^2ck(^F^n-^  -  .  (67) 

k= 1 

The  unknown  constants  ci  to  cn_i  are  determined  by  minimizing  R  =  ||T'||2  =  1 1 pfn  — 
p} out  1 1 2 ,  which  amounts  to  solving  the  following  system  of  (n  —  1)  linear  equations  given 
by: 


n— 1 

F (n)  -  L(n“m),F(n)  -  F^n~k^  ck  =  [F{n)  -  m  =  1  •  •  •  n  -  1  (68) 

k= 1 

where  the  notation  ( F,G )  stands  for  the  L2  inner  product  between  functions  F( r)  and 
G(  r)  and  is  given  by 

(F,G)  =  J  F(r)G(r)dr  .  (69) 

The  value  of  the  parameter  7 mix  hi  equation  (63)  is  chosen  to  be  0.5  in  the  present  work. 
All  the  integrals  involved  in  the  linear  system  (68)  are  evaluated  using  Gauss  quadrature 
rules,  and  the  values  of  p[^ut)  (r)  are  stored  as  quadrature  point  values  after  every  nth  self 
consistent  iteration.  In  all  the  simulations  conducted  in  the  present  work,  the  Anderson 
mixing  scheme  is  used  with  full  history. 
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5  Numerical  results 

5.1  Rates  of  convergence 

We  begin  with  the  examination  of  convergence  rates  of  the  finite-element  approximation 
using  a  sequence  of  meshes  with  decreasing  mesh  sizes  for  various  polynomial  orders  of 
interpolation.  The  benchmark  problems  used  in  this  study,  include:  (i)  all-electron  cal¬ 
culations  performed  on  boron  atom  and  methane  molecule,  which  represent  non-periodic 
problems  with  a  Coulomb-singular  nuclear  potential;  (ii)  pseudopotential  calculations 
performed  on  a  barium  cluster  that  represents  a  non-periodic  problem  with  a  smooth 
external  potential,  and  a  bulk  calculation  of  face-centered-cubic  (FCC)  calcium  crystal. 
In  the  case  of  all-electron  calculations,  the  nuclear  charges  are  treated  as  point  charges 
on  the  nodes  of  the  finite-element  triangulation  and  the  discretization  provides  a  regular¬ 
ization  for  the  electrostatic  potential.  We  note  that  the  self-energy  of  the  nuclei  in  this 
case  is  mesh-dependent  and  diverges  upon  mesh  refinement.  Thus,  the  self  energy  is  also 
computed  on  the  same  mesh  that  is  used  to  compute  the  total  electrostatic  potential, 
which  ensures  that  the  divergent  components  of  the  variational  problem  on  the  right 
hand  side  of  equation  (11)  and  the  self  energy  exactly  cancel  owing  to  the  linearity  of  the 
Poisson  equation  (cf.  Appendix  C  in  [31]  for  a  detailed  discussion). 

We  conduct  the  convergence  study  by  adopting  the  following  procedure.  Using  the  a 
priori  knowledge  of  the  asymptotic  solutions  of  the  atomic  wavefunctions  [34],  we  deter¬ 
mine  the  coarsening  rate  from  equation  (57)  which  is  used  to  construct  the  coarsest  mesh. 
Though  the  computed  coarsening  rates  use  the  far-held  asymptotic  solutions  instead  of 
the  exact  ground-state  wavefunctions  that  are  a  priori  unknown,  the  obtained  meshes 
nevertheless  provide  a  systematic  way  for  the  discretization  of  vacuum  in  non-periodic 
calculations  as  opposed  to  using  an  arbitrary  coarse-graining  rate  or  uniform  discretiza¬ 
tion.  In  the  case  of  periodic  pseudopotential  calculations,  a  finite-element  discretization 
with  a  uniform  mesh-size  is  used.  A  uniform  subdivision  of  the  initial  coarse-mesh  is 
carried  out  to  generate  a  sequence  of  refined  meshes,  which  represents  a  systematic  re¬ 
finement  of  the  finite-element  approximation  space.  The  ground-state  energies  from  the 
discrete  formulation,  E^,  obtained  from  the  sequence  of  meshes  constructed  using  the 
HEX125SPECT  element  and  containing  Ne  elements  are  used  to  fit  an  expression  of  the 
form 

\E0-Eh\=C(l/Ne)2k/3,  (70) 

to  determine  the  constants  Eq,  C  and  k.  The  obtained  value  of  Eq,  which  represents 
the  extrapolated  continuum  ground-state  energy  computed  using  the  HEX125SPECT 
element,  is  used  as  the  reference  energy  to  compute  the  relative  error  in  the 

convergence  study  of  various  orders  of  finite-elements  reported  in  subsequent  subsections. 

5.1.1  All-electron  calculations 

We  first  begin  with  all-electron  calculations  by  studying  two  examples:  (i)  boron  atom 
(ii)  methane  molecule. 

5. 1.1.1  Boron  atom:  This  is  one  of  the  simplest  systems  displaying  the  full  com¬ 
plexity  of  an  all-electron  calculation.  For  the  present  case,  we  use  a  Chebyshev  filter  of 
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order  500  to  compute  the  occupied  eigenspace.  As  discussed  in  Section  4.2.2,  we  use  a 
finite-temperature  smearing  to  avoid  instability  in  the  SCF  iteration  due  to  charge  slosh¬ 
ing  from  the  degenerate  states  at  the  Fermi  energy.  A  smearing  factor  o  =  0.0003168  Ha 
(T=100K)  is  used  in  the  present  study.  We  first  determine  the  mesh  coarse-graining 
rate  by  noting  that  the  asymptotic  decay  of  atomic  wavefunctions  is  exponential,  and  an 
upper  bound  to  this  decay  under  the  Hartree-Fock  approximation  is  given  by  [34] 


~  exp 


for  r  — >  oo  , 


(71) 


where  —  e  denotes  the  energy  of  the  highest  occupied  atomic/molecular  orbital.  While 
the  above  estimate  has  been  derived  for  the  Hartree-Fock  formulation,  it  nevertheless 
provides  a  good  approximation  to  the  asymptotic  decay  of  wavefunctions  computed  using 
the  Kohn-Sham  formulation.  We  use  the  aforementioned  estimate,  though  not  optimal, 
for  all  the  wavefunctions  in  the  atomic  system,  and  adopt  this  approach  for  all  systems 
considered  subsequently.  Hence,  in  equation  (57),  we  consider  xjji  to  be 


where  £  =  V2i . 


(72) 


The  electrostatic  potential  governed  by  the  Poisson  equation  with  a  total  charge  density 
being  equal  to  the  sum  of  5 ^2(r)  and  —5 S(r)  is  given  by 


0(r) 


—5  exp  (— 2£  r ) 


(73) 


Using  the  above  equations,  the  mesh  coarse-graining  rate  from  equation  (57)  is  given  by 


h(r)  =  A 


— £2fc+5  exp  (-2  £  r)  +  25  exp  (-4  £  r) 


£fc+22*+i  ,  ^  fk  +  l\2ne(k+l-n)\ 


£ 

71=0 


rk-n-\-  2 


—  l/(2fc+3) 


(74) 


Since  e  in  the  above  equation  is  unknown  a  priori,  the  value  of  eh  determined  on  a 
coarse  mesh  is  used  in  the  above  equation  to  obtain  h(r).  We  now  perform  the  numerical 
convergence  study  with  tetrahedral  and  hexahedral  spectral  elements  up  to  eighth  order 
using  this  coarse-graining  rate,  and  the  results  are  shown  in  figure  1.  The  value  of  Eq 
computed  from  equation  (70)  is  —24.34319127  Ha,  which  is  used  to  compute  the  relative 
errors  in  the  energies.  We  observe  that  all  the  elements  studied  show  close  to  optimal 
rates  of  convergence,  0(h2k),  where  k  is  the  degree  of  the  polynomial.  An  interesting 
point  to  note  is  that,  although  the  governing  equations  are  non-linear  in  nature  and 
the  nuclear  potential  approaches  a  Coulomb-singular  solution  upon  mesh  refinement, 
optimal  rates  of  convergence  are  obtained.  Recent  mathematical  analysis  [37]  shows  that 
the  finite-element  approximation  for  the  Kohn-Sham  DFT  problem  does  provide  optimal 
rates  of  convergence  for  pseudopotential  calculations.  To  the  best  of  our  knowledge, 
mathematical  analysis  of  higher-order  finite-element  approximations  of  the  Kohn-Sham 
DFT  problem  with  Coulomb-singular  nuclear  potentials  is  still  an  open  problem. 

We  note  that,  in  the  case  of  linear  finite-elements,  a  large  number  of  elements  are 
required  to  even  achieve  modest  relative  errors.  In  fact,  close  to  five  million  linear  TET4 
elements  are  required  for  a  single  boron  atom  to  obtain  a  relative  error  of  10— 2 ,  while 


25 


Motamarri,  Nowak,  Leiter,  Knap,  Sz  Gavini 


Figure  1:  Convergence  rates  for  the  finite-element  approximation  of  a  single  boron 
atom. 


relative  errors  up  to  10~4  are  achieved  with  just  few  hundreds  of  HEX125SPECT  and 
HEX343SPECT  elements,  and  even  higher  accuracies  are  achieved  with  a  few  thousands 
of  these  elements. 


5. 1.1. 2  Methane  molecule:  The  next  example  we  study  is  methane  with  a  C-H 
bond  length  of  2.07846  a.u.  and  a  C-H-C  tetrahedral  angle  of  109.4712°.  For  the  present 
case,  we  use  a  Chebyshev  filter  of  order  500  to  compute  the  occupied  eigenspace,  and  a 
smearing  factor  a  =  0.0003168  Ha  (T=100K)  for  the  Fermi-Dirac  smearing.  As  in  the 
case  of  boron  atom,  the  finite-element  mesh  for  this  molecule  is  constructed  to  be  locally 
refined  around  the  atomic  sites,  while  coarse-graining  away.  The  mesh  coarsening  rate 
in  the  vacuum  is  determined  numerically  by  employing  the  asymptotic  solution  of  the 
far-held  electronic  fields,  estimated  as  a  superposition  of  single  atom  far-held  asymptotic 
helds,  in  equation  (57).  To  this  end,  asymptotic  behavior  of  the  atomic  wavefunctions  in 
carbon  atom  (V'C'(r))  is  chosen  to  be 


^C(r) 


where  £  =  \/2i, 


(75) 


where  e  (negative  of  the  eigenvalue  of  the  highest  occupied  eigenstate)  is  determined 
from  a  coarse  mesh  calculation  of  single  carbon  atom.  The  corresponding  electrostatic 
potential  is  governed  by  the  Poisson  equation,  with  total  charge  density  being  equal  to 
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the  sum  of  6|'0c'(r)|2  and  —6 5(r),  and  is  given  by 

4>(r)  =  —6  exp  (— 2£  r)  .  (76) 

In  the  case  of  hydrogen  atom,  the  analytical  solution  is  given  by 


$H(r)  = 


and  the  corresponding  electrostatic  potential  is  given  by 


(j)(r) 


exp  (—2  r) 


(77) 


(78) 


We  now  perform  the  numerical  convergence  study  with  both  tetrahedral  and  hexahedral 
elements  with  the  meshes  constructed  as  explained  before.  Figure  2  shows  the  convergence 
results  for  the  various  elements,  and  figure  3  shows  the  isocontours  of  electron  density  for 
methane  molecule.  The  value  of  Eq,  the  reference  ground-state  energy  of  the  methane 
molecule,  is  found  to  be  —40.12028478  Ha.  As  in  the  case  of  boron  atom,  we  obtain  close 
to  optimal  convergence  rates,  and  significantly  higher  relative  accuracies  in  ground-state 
energies  are  observed  by  using  higher-order  elements. 


Figure  2:  Convergence  rates  for  the  finite-element  approximation  of  a  methane 
molecule. 
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Figure  3:  Electron  density  isocontours  of  methane. 


5.1.2  Pseudopotential  calculations 

We  now  turn  to  pseudopotential  calculations  in  multi-electron  systems.  A  pseudopoten¬ 
tial  constitutes  the  effective  potential  of  the  nucleus  and  core  electrons  experienced  by 
the  valence  electrons.  Pseudopotentials  are  constructed  such  that  the  wavefunctions  of 
valence  electrons  outside  the  core  and  their  corresponding  eigenvalues  are  close  to  those 
computed  using  all-electron  calculations.  In  the  present  work,  we  use  the  local  evanes¬ 
cent  core  pseudopotential  [65]  as  a  model  pseudopotential  to  demonstrate  our  ideas.  This 
pseudopotential  has  the  following  form 


v*on  =  ~W »  p(1-(1  +  /32/)e 


-ay\ 


-Ae~y 


(79) 


where  Z  denotes  the  number  of  valence  electrons  and  y  =  |r  —  Ri\/Rc.  The  core  decay 
length  Rc  and  a  >  0  are  atom-dependent  constants  [65].  The  constants  /3  and  A  are 
evaluated  by  the  following  relations: 


P 


a3  —  2  a 
4(a2  —  1) 


af3. 


(80) 


5. 1.2.1  Barium  cluster:  The  first  pseudopotential  calculation  we  present  is  a  bar¬ 
ium  2x2x2  body-centered  cubic  (BCC)  cluster  with  a  lattice  parameter  of  9.5  a.u.. 
A  Chebyshev  filter  of  order  16  is  employed  to  compute  the  occupied  eigenspace,  and  a 
smearing  factor  a  =  0.000634  Ha  (T=200K)  is  used  for  the  Fermi-Dirac  smearing.  The 
finite-element  mesh  for  this  molecule  is  constructed  to  be  uniform  in  the  cluster  region 
where  barium  atoms  are  present,  while  coarse-graining  away.  The  mesh  coarsening  rate 
in  the  vacuum  is  determined  numerically  by  employing  the  asymptotic  solution  of  the 
far-held  electronic  fields,  estimated  as  a  superposition  of  single  atom  far-held  asymptotic 
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fields,  in  equation  (57).  To  this  end,  asymptotic  behavior  of  the  atomic  wavefunctions  in 
barium  atom  is  chosen  to  be 


V>(r) 


where  £  =  V2i , 


(81) 


where  e  (negative  of  the  eigenvalue  of  the  highest  occupied  eigenstate)  is  estimated  from 
a  coarse  mesh  calculation.  The  corresponding  electrostatic  potential  is  determined  by 
the  Poisson  equation,  with  total  charge  density  being  equal  to  the  sum  of  2 'fp(r)  and 
—2 5(r),  and  is  given  by 


<t>(r)  =  -2  exp  (-2$  r)  ^  ^  •  (82) 

The  numerical  convergence  study  is  conducted  with  both  tetrahedral  and  hexahedral  el¬ 
ements,  and  Figure  4  shows  the  rates  of  convergence  for  the  various  elements  considered 
that  are  close  to  optimal  rates  of  convergence.  The  value  of  Eq,  the  energy  per  atom, 
is  found  to  be  —0.638599384  Ha.  The  main  observation  that  distinguishes  this  study 
from  the  all-electron  study  is  that  all  orders  of  interpolation  provide  much  greater  accu¬ 
racies  for  pseudopotential  calculations  in  comparison  to  all-electron  calculations.  Linear 
basis  functions  are  able  to  approximate  the  ground-state  energies  up  to  relative  errors 
of  10— 3 ,  whereas  relative  errors  of  10~6  can  be  achieved  with  higher-order  elements  with 
polynomial  degrees  of  four  and  above. 


Figure  4:  Convergence  rates  for  the  finite-element  approximation  of  2  x  2  x  2  barium 
cluster. 
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Figure  5:  Electron  density  contours  of  2  x  2  x  2  barium  cluster. 


5. 1.2.2  Perfect  crystal  with  periodic  boundary  conditions:  The  next  ex¬ 
ample  considered  is  that  of  a  perfect  calcium  face-centered  cubic  (FCC)  crystal  with 
lattice  constant  10.55  a.u..  A  Bloch  ansatz  [47]  is  used  in  the  simulation  with  10  k-points 
(high  symmetry)  to  sample  the  first  Brillouin  zone,  which  represents  a  quadrature  rule  of 
order  2  [66] .  The  eigenspace  is  computed  using  the  Krylov-Schur  method,  and  a  smear¬ 
ing  parameter  of  0.003168  Ha  (T=1000K)  is  used  in  these  simulations.  Figure  6  shows 
the  rates  of  convergence  for  the  various  higher-order  finite-elements  considered  in  the 
present  work.  The  value  of  Eq,  the  reference  bulk  energy  per  atom,  is  computed  to  be 
—0.729544738  Ha.  We  note  that  the  results  are  qualitatively  similar  to  the  pseudopo¬ 
tential  calculations  on  barium  cluster. 


Figure  6:  Convergence  rates  for  the  finite-element  approximation  of  bulk  calcium. 
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5.2  Computational  cost 

We  now  examine  the  key  aspect  of  computational  efficiency  afforded  by  the  use  of  higher- 
order  finite-element  approximations  in  the  Kohn-Sham  DFT  problem.  As  seen  from 
the  results  in  Section  5.1,  higher-order  finite-element  discretizations  provide  significantly 
higher  accuracies  with  far  fewer  elements  in  comparison  to  linear  finite-elements.  How¬ 
ever,  the  use  of  higher-order  elements  increases  the  per-element  computational  cost  due 
to  an  increase  in  the  number  of  nodes  per  element,  which  also  results  in  an  increase  in  the 
bandwidth  of  the  Hamiltonian  matrix.  Further,  higher-order  elements  require  a  higher- 
order  accurate  quadrature  rule,  which  again  increases  the  per-element  computational 
cost.  Thus,  in  order  to  unambiguously  determine  the  computational  efficiency  afforded 
by  higher-order  finite-element  discretizations,  we  measured  the  CPU-time  taken  for  the 
simulations  conducted  on  the  aforementioned  benchmark  problems  for  a  wide  range  of 
meshes  providing  different  relative  accuracies.  All  the  simulations  are  conducted  using 
meshes  with  the  coarse-graining  rates  determined  by  the  approach  outlined  in  Section  3.2. 
All  the  numerical  simulations  reported  in  this  work  are  conducted  using  a  parallel  imple¬ 
mentation  of  the  code  based  on  MPI,  and  are  executed  on  a  parallel  computing  cluster 
with  the  following  specifications:  dual-socket  six-core  Intel  Core  17  CPU  nodes  with  12 
total  processors  (cores)  per  node,  48  GB  memory  per  node,  and  40  Gbps  Infiniband 
networking  between  all  nodes  for  fast  MPI  communication.  The  various  benchmark  cal¬ 
culations  were  executed  using  1  to  12  cores,  while  the  results  for  the  larger  problems 
discussed  subsequently  were  executed  on  48  to  96  cores.  It  was  verified  (see  Section  5.3) 
that  our  implementation  scales  linearly  on  this  parallel  computing  platform  for  the  range 
of  processors  used,  and  hence  the  total  CPU-times  reported  for  the  calculation  are  close 
to  the  wall-clock  time  on  a  single  processor.  The  number  of  processors  used  to  conduct 
ABINIT  and  GAUSSIAN  simulations  for  the  comparative  studies,  discussed  subsequently, 
are  carefully  chosen  to  ensure  scalability  of  these  codes,  and  are  typically  less  than  20 
cores. 

5.2.1  Benchmark  systems 

We  first  consider  the  benchmark  systems  comprising  of  boron  atom,  methane  molecule, 
barium  cluster  and  bulk  calcium  crystal.  The  mesh  coarsening  rates  for  these  benchmark 
systems  derived  in  Section  5.1  are  employed  in  the  present  study.  The  number  of  elements 
are  varied  to  obtain  finite-element  approximations  with  varying  accuracies  that  target 
relative  energy  errors  in  the  range  of  10” 1  —  10”'.  We  employ  the  same  numerical 
algorithms  and  algorithmic  parameters — order  of  Chebyshev  filter,  finite-temperature 
smearing  parameter — as  discussed  in  Section  5.1  for  the  present  study.  The  total  CPU¬ 
time  is  measured  for  each  of  these  simulations  on  the  series  of  meshes  constructed  for 
varying  finite-element  interpolations  and  normalized  with  the  longest  time  in  the  series 
of  simulations  for  a  given  material  system.  The  relative  error  in  ground-state  energy  is 
then  plotted  against  this  normalized  CPU-time.  Figures  7,  8,  9  and  10  show  these  results 
for  boron,  methane  molecule,  barium  cluster  and  bulk  calcium  crystal,  respectively. 

Our  results  show  that  the  computational  efficiency  of  higher-order  interpolations  im¬ 
proves  as  the  desired  accuracy  of  the  computations  increases,  in  particular  for  relative 
errors  of  the  order  of  10”5  and  lower,  which  corresponds  to  chemical  accuracy.  We  note 
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that  a  thousand-fold  computational  advantage  is  obtained  with  higher-order  elements 
over  linear  TET4  element  even  for  modest  accuracies  corresponding  to  relative  errors  of 
10~2.  For  relative  errors  of  ICE3,  quadratic  HEX27  element  performance  is  similar  to 
other  finite-elements  with  quartic  interpolation  and  beyond,  and  sometimes  marginally 
better.  However,  all  higher-order  elements  significantly  outperform  linear  TET4  ele¬ 
ment.  Considering  relative  errors  of  ICE5,  quartic  HEX125SPECT  element  performs 
better  in  comparison  to  quadratic  HEX27  element  almost  by  a  factor  of  10,  while  hexic 
HEX343SPECT  element  is  computationally  more  efficient  than  HEX125SPECT  element 
by  a  factor  greater  than  three  and  this  factor  improves  further  for  lower  relative  errors. 
The  octic  HEX729SPECT  element  performs  only  marginally  better  than  the  hexic  ele¬ 
ment  for  relative  errors  lower  than  ICE5.  Comparing  the  results  across  different  materials 
systems,  we  observe  that  the  performance  of  lower-order  elements  is  inferior  in  the  case 
of  all-electron  systems  in  comparison  to  pseudopotential  systems.  For  instance,  at  a  rela¬ 
tive  error  of  ICE2,  the  solution  time  using  TET4  is  more  than  three  orders  of  magnitude 
larger  than  HEX343SPECT  for  the  case  of  methane  molecule.  However,  the  solution  time 
is  three  orders  of  magnitude  larger  for  TET4  over  HEX343SPECT  for  the  case  barium 
cluster  at  a  relative  error  of  ICE3. 

In  summary,  for  chemical  accuracies  corresponding  to  relative  errors  lower  than  ICE5, 
the  computational  efficiency  improves  significantly  with  the  order  of  the  element  up  to 
sixth-order,  with  diminishing  returns  beyond.  Further,  the  relative  performance  of  higher- 
order  elements  with  respect  to  linear  TET4  element  in  the  case  of  all-electron  calculations 
is  significantly  better  in  comparison  to  pseudopotential  calculations.  Lastly,  qualitatively 
speaking,  the  sequence  of  graphs  of  relative  error  vs.  normalized  CPU-time  for  the  various 
elements  tend  towards  increasing  accuracy  and  computational  efficiency  with  increasing 
order  of  finite-element  interpolation.  However,  we  note  that,  for  the  systems  studied,  the 
point  of  diminishing  returns  in  terms  of  computational  efficiency  of  higher-order  elements 
for  relative  errors  commensurate  with  chemical  accuracy  is  around  sixth-order. 

5.2.2  Large  materials  systems 

In  this  section,  we  further  investigate  the  computational  efficiency  afforded  by  higher- 
order  finite-elements  by  considering  larger  material  systems  involving  both  pseudopoten¬ 
tial  and  all-electron  calculations.  As  a  part  of  this  investigation,  we  demonstrate  the 
effectiveness  of  the  higher-order  finite-elements  by  comparing  the  solution  times  of  pseu¬ 
dopotential  calculations  against  plane-wave  basis  set  and  solution  times  of  all-electron 
calculations  against  a  Gaussian  basis  set  providing  similar  relative  accuracy  in  the  ground- 
state  energy.  The  systems  chosen  as  a  part  of  this  study  are  aluminium  clusters  containing 
3x3x3,  5x5x5,  7x7x7  FCC  unit  cells  for  the  case  of  pseudopotential  calculations, 
and  a  graphene  sheet  containing  100  atoms  in  the  case  of  all-electron  calculations. 

5.2.2. 1  Pseudopotential  calculations:  The  pseudopotential  calculations  on  alu¬ 
minum  clusters  are  conducted  using  the  evanescent  core  pseudopotential  [65].  All  the 
simulations  in  these  case  studies  use  superposition  of  single-atom  electron  densities  as 
the  initial  guess  for  the  electron  density  in  the  first  SCF  iteration.  We  used  the  Krylov- 
Schur  iteration  for  solving  the  eigenvalue  problem  in  the  first  SCF  iteration  and  used 
Chebyshev  filtered  subspace  iteration  for  the  subsequent  SCF  iterations.  The  order  of 
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Figure  7:  Computational  efficiency  of  various  orders  of  finite-element  approxima¬ 
tions.  Case  study:  boron  atom. 


Figure  8:  Computational  efficiency  of  various  orders  of  finite-element  approxima¬ 
tions.  Case  study:  methane  molecule. 
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Figure  9:  Computational  efficiency  of  various  orders  of  finite-element  approxima¬ 
tions.  Case  study:  barium  2x2x2  BCC  cluster. 


Figure  10:  Computational  efficiency  of  various  orders  of  finite-element  approxima¬ 
tions.  Case  study:  bulk  calcium  FCC  crystal. 
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Chebyshev  filters  used  for  the  3x3x3,  5x5x5  and  7x7x7  aluminum  clusters  are  12,  30 
and  50  respectively.  All  simulations  are  conducted  using  a  finite  temperature  Fermi-Dirac 
smearing  parameter  of  0.0003168  Ha  (T=100K).  In  order  to  conduct  a  one-to-one  com¬ 
parison,  the  plane-wave  simulations  are  also  conducted  using  the  same  pseudopotential 
and  finite  temperature  Fermi-Dirac  smearing  used  in  the  finite-element  simulations. 


Aluminium  3x3x3  cluster: 

We  first  consider  an  aluminium  cluster  containing  3x3x3  FCC  unit  cells  with  a  lat¬ 
tice  spacing  of  7.45  a.u..  The  system  comprises  of  172  atoms  with  516  electrons.  The 
finite-element  mesh  for  this  calculation  is  chosen  to  be  uniform  in  the  cluster  region  con¬ 
taining  aluminium  atoms,  while  coarse-graining  away.  The  mesh  coarsening  rate  in  the 
vacuum  is  determined  numerically  by  employing  the  asymptotic  solution  of  the  far-held 
electronic  fields,  estimated  as  a  superposition  of  single  atom  far-held  asymptotic  helds, 
in  equation  (57).  To  this  end,  the  asymptotic  behavior  of  atomic  wavefunctions  in  an 
aluminium  atom  (V’(0)  is  chosen  to  be 


ip{r) 


where  £  =  \/3 7, 


(83) 


where  e  (negative  of  the  eigenvalue  of  the  highest  occupied  eigenstate)  is  determined  from 
a  single  aluminum  atom  coarse  mesh  calculation.  The  corresponding  total  electrostatic 
potential,  governed  by  the  Poisson  equation  with  total  charge  density  being  equal  to  the 
sum  of  3 xjj2(r)  and  —  3<5(r),  is  given  by 


0(r) 


—3  exp  (— 2£  r ) 


(84) 


Table  3:  Convergence  with  finite-element  basis  for  a  3  x  3  x  3  FCC  aluminum  cluster 
using  HEX125SPECT  element 


Degrees  of  freedom 

Energy  per  atom  (eV) 

Relative  error 

184, 145 

-54.1076597 

3.4  xl0“2 

1,453,089 

-56.0076146 

1.8  xlO-4 

11,546,177 

-56.01788889 

1.3  xlO”6 

We  obtain  the  converged  value  of  the  ground-state  energy  by  following  the  procedure 
outlined  in  Section  5.1.  We  use  a  sequence  of  increasingly  rehned  HEX125SPECT  finite- 
element  meshes  on  a  cubic  simulation  domain  of  side  400  a.u.,  and  compute  the  ground- 
state  energy  Ej z  for  these  meshes  which  are  tabulated  in  Table  3.  Using  the  extrapolation 
procedure  discussed  in  Section  5.1  (equation  (70)),  we  compute  the  reference  ground-state 
energy  (energy  per  atom)  to  be  Eq  =  —56.0179603  eV.  The  relative  errors  reported  in 
Table  3  are  with  respect  to  this  reference  energy,  and  this  reference  energy  will  be  used 
to  compute  the  relative  errors  for  all  subsequent  simulations  for  this  material  system. 
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Table  4:  Comparison  of  higher-order  finite-element  (FE)  basis  with  plane-wave 
basis  for  a  3  x  3  x  3  FCC  aluminum  cluster 


Type  of  basis  set 

Relative  error 

Time  (CPU-hrs) 

Plane-wave  basis  (cut-off 
30  Ha,  cell-size  of  60  a.u.) 

3.3  xlO-6 

646 

FE  basis  (HEX343SPECT, 
2, 808, 385  nodes,  domain 
size:  200  a.u.) 

3.6  xlO”6 

371 

In  order  to  assess  the  performance  of  higher-order  finite-elements  on  this  material  sys¬ 
tem,  we  conduct  the  finite-element  simulation  with  a  mesh  containing  HEX343SPECT 
elements  and  compare  the  computational  CPU-time  against  a  plane-wave  basis  code 
ABINIT  [5]  solved  to  a  similar  relative  accuracy  in  the  ground-state  energy.  The  finite- 
element  simulation  has  been  performed  on  a  cubic  domain  size  of  200  a.u.  with  a  mesh 
coarsening  rate  away  from  the  cluster  of  atoms  as  determined  using  equations  (57),  (83),  (84). 
The  resulting  mesh  contains  12, 800  HEX343SPECT  elements  with  2,  808, 385  nodes.  The 
plane-wave  basis  simulation  has  been  performed  by  using  a  cell-size  of  60  a.u.  and  a  cut¬ 
off  energy  of  30  Ha  with  one  k-point  to  obtain  the  ground-state  energy  of  similar  relative 
accuracy  as  the  finite-element  simulation.  The  solution  times  for  the  finite-element  basis 
and  the  plane-wave  basis  are  tabulated  in  Table  4.  These  results  demonstrate  that  the 
performance  of  higher-order  finite-element  discretization  is  comparable,  in  fact  better  by 
a  two-fold  factor,  to  the  plane-wave  basis  for  this  material  system.  Figure  If  shows  the 
electron  density  contours  on  the  mid-plane  of  the  3x3x3  aluminum  cluster  from  the 
finite-element  simulation. 
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Aluminium  5x5x5  cluster: 

We  next  consider  an  aluminium  cluster  containing  5x5x5  FCC  unit  cells  with  a  lattice 
spacing  of  7.45  a.u..  This  material  system  comprises  of  666  atoms  with  1998  electrons. 
The  finite-element  mesh  is  constructed  along  similar  lines  as  the  3x3x3  cluster,  where  a 
uniform  mesh  resolution  is  chosen  in  the  cluster  region  containing  aluminium  atoms  and 
coarse-graining  away  into  the  vacuum  with  a  numerically  determined  coarsening  rate  as 
discussed  earlier.  As  before,  we  first  obtain  the  reference  ground-state  energy  by  using 
a  sequence  of  increasingly  refined  HEX125SPECT  finite-element  meshes  with  a  cubic 
simulation  domain  of  side  800  a.u.  and  extrapolating  the  computed  ground-state  energies 
on  these  meshes  (cf.  Table  5).  The  reference  ground-state  energy  (energy  per  atom), 
thus  determined,  is  Eq  =  —56.0495071  eV. 

Table  5:  Convergence  with  finite-element  basis  for  a  5  x  5  x  5  FCC  cluster  using 
HEX125SPECT  element 


Degrees  of  freedom 

Energy  per  atom(eV) 

Relative  error 

394, 169 

-54.8536312 

2.1  xlO-2 

3, 124,  593 

-56.0425334 

1.2  xlO”4 

24,883,937 

-56.0494500 

1.01  xlO-6 

We  now  assess  the  performance  of  higher-order  finite-elements  on  this  material  system 
in  comparison  to  a  plane-wave  basis.  The  finite-element  simulation  in  this  case  has  been 
performed  on  a  simulation  domain  size  of  400  a.u.  containing  36, 064  HEX343SPECT 
elements  with  7,  875,  037  nodes.  The  plane-wave  basis  simulation  conducted  using  the 
ABINIT  package  has  been  performed  on  a  cell-size  of  80  a.u.  and  a  cut  off  energy  of  30  Ha 
with  one  k-point  to  sample  the  Brillouin  zone  to  obtain  the  ground-state  energy  of  similar 
relative  accuracy.  The  solution  time  for  the  finite-element  basis  and  the  plane-wave  basis 
are  tabulated  in  Table  6,  which  shows  that  using  higher-order  finite-elements  one  can 
achieve  similar  computational  efficiencies  as  afforded  by  a  plane- wave  basis,  at  least  in 
the  case  of  non-periodic  calculations.  Figure  12  shows  the  electron  density  contours  on 
the  mid-plane  of  the  5x5x5  FCC  cluster  from  the  finite-element  simulation. 

Aluminium  7x7x7  cluster: 

As  a  final  example  in  our  case  study  of  pseudopotential  calculations,  we  study  an  alu¬ 
minium  cluster  containing  7x7x7  FCC  unit  cells  with  a  lattice  spacing  of  7.45  a.u.  This 
material  system  comprises  of  1688  atoms  with  5064  electrons.  We  only  use  the  finite- 
element  basis  to  simulate  this  system  as  the  plane-wave  basis  calculation  was  beyond 
reach  for  this  material  system  with  the  computational  resources  at  our  disposal.  The 
finite-element  simulation  has  been  performed  on  a  cubic  simulation  domain  with  a  side  of 
800  a.u..  The  finite-element  mesh  was  constructed  as  described  in  the  simulation  of  other 
aluminum  clusters,  and  comprised  of  69,  984  HEX343SPECT  elements  with  15,  257, 197 
nodes.  The  computed  energy  per  atom  for  this  aluminum  cluster  is  —56.06826762  eV, 
and  figure  13  shows  the  electron  density  contours  on  the  mid-plane  of  the  cluster. 
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Tabic  6:  Comparison  of  higher-order  finite-element  (FE)  basis  with  plane- wave 
basis  sets  for  a  5  x  5  x  5  FCC  aluminum  cluster 


Type  of  basis  set 

Relative  error 

Time  (CPU-hrs) 

Plane-wave  basis  (cut-off 
30  Ha,  cell-size  of  80  a.u) 

2.1  xl0“5 

7307 

FE  basis  (  HEX343SPECT, 
7,  875,  037  nodes,  domain 
size:  400  a.u.) 

7.9  xlO"6 

6619 

Figure  12:  Electron  density  contours  of  5  x  5  x  5  FCC  aluminium  cluster. 
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Figure  13:  Electron  density  contours  of  7  x  7  x  7  FCC  aluminium  cluster. 


5. 2. 2. 2  All-electron  calculations:  We  now  demonstrate  the  performance  of  higher- 
order  finite-element  discretization  in  the  case  of  all-electron  calculations,  by  considering  a 
graphene  sheet  as  our  benchmark  problem.  We  simulate  a  graphene  sheet  containing  100 
atoms  (600  electrons)  with  a  C-C  bond  length  of  2.683412  a.u..  We  first  obtain  a  con¬ 
verged  value  of  the  ground-state  energy  by  conducting  simulations  using  the  GAUSSIAN 
package  [41]  using  the  polarization  consistent  DFT  basis  sets  (pc-n),  which  have  been 
demonstrated  to  provide  a  systematic  convergence  in  Kohn-Sham  DFT  calculations  [67]. 
Since  these  basis  sets  are  not  directly  available  in  the  GAUSSIAN  package,  we  introduce 
them  as  an  external  basis  set  for  conducting  these  simulations.  The  ground-state  energy 
value  obtained  for  triple-zeta  pc-3  basis  set  is  taken  as  the  reference  value  (Eq)  in  this 
study,  which  is  computed  to  be  Eq  =  —37.7619141  Ha  per  atom. 

We  assess  the  performance  of  higher-order  finite-elements  on  this  material  system 
by  comparing  the  computational  CPU-time  against  the  pc-2  basis  set,  which  provides 
similar  relative  accuracy  in  the  ground-state  energy  with  respect  to  the  Eq  determined 
above.  The  finite-element  mesh  for  this  problem  is  chosen  to  be  uniform  in  the  region 
containing  carbon  atoms  with  local  refinement  around  each  atom  while  coarse-graining 
away  into  vacuum.  The  mesh  coarsening  rate  in  the  vacuum  is  determined  numerically 
by  employing  the  asymptotic  solution  of  the  far-field  electronic  fields,  estimated  as  a  su¬ 
perposition  of  single  atom  far-field  asymptotic  fields,  in  equation  (57).  To  this  end,  the 
asymptotic  behavior  of  the  atomic  wavefunctions  in  carbon  atom  (V>(r))  is  chosen  to  be  as 
in  equation  (75).  Since  the  GAUSSIAN  package  does  not  account  for  partial  occupancy 
of  energy  levels,  we  suppress  the  Fermi-Dirac  smearing  in  the  finite-element  simulations 
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for  the  present  case  in  order  to  conduct  a  one-to-one  comparison.  Table  7  shows  the 
relevant  results  of  the  simulation  with  figure  14  showing  the  electron  density  contours  of 
the  graphene  sheet.  We  remark  that  the  finite-element  simulation  with  HEX125SPECT 
elements  is  ten-fold  slower  than  the  GAUSSIAN  simulation  with  pc  basis  set.  We  note 
that  the  pc  basis  set  is  highly  optimized  for  specific  material  systems,  which  is  reflected 
in  the  far  fewer  basis  functions  required  for  these  calculations.  We  believe  this  is  the 
main  reason  for  the  superior  performance  of  Gaussian  basis,  which,  however,  may  not 
be  transferable  to  generic  material  systems-  for  e.  g.  metallic  systems.  We  also  note 
that  the  computational  time  using  finite-element  basis  functions  can  possibly  be  reduced 
significantly  by  enriching  the  finite-element  shape  functions  with  single  atom  wavefunc- 
tions  using  the  partitions-of- unity  approach  [68,  69].  The  degree  of  freedom  advantage 
of  the  partitions-of-unity  approach  for  Kohn-Sham  calculations  has  been  first  demon¬ 
strated  in  [42],  and  presents  a  promising  future  direction  for  all-electron  Kohn-Sham 
DFT  calculations. 

Table  7:  100  atom  graphene  sheet  (600  electrons) 


Type  of  basis  set 

Relative  error 

Time  (CPU-hrs) 

pc2  (Gaussian,  3,  000  basis  functions) 

1.06  xl0~4 

666 

FE  basis  (HEX125SPECT,  8,  004,  003  nodes) 

1.2  xlO”4 

7461 

5.3  Scalability  of  finite-element  basis: 

The  parallel  scalability  of  our  numerical  implementation  is  demonstrated  in  Figure  15. 
We  study  the  strong  scaling  behavior  by  measuring  the  relative  speedup  with  increasing 
number  of  processors  on  a  fixed  problem  of  constant  size,  which  is  chosen  to  be  the 
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aluminum  3x3x3  cluster  discretized  with  HEX125SPECT  elements  containing  3.91 
million  degrees  of  freedom.  The  speedup  is  measured  relative  to  the  computational  CPU¬ 
time  taken  on  2  processors,  as  a  single  processor  run  was  beyond  reach  due  to  memory 
limitations.  It  is  evident  from  the  figure,  that  the  scaling  is  almost  linear.  The  relative 
speedup  corresponding  to  96-fold  increase  in  the  number  of  processors  is  87.82,  which 
translates  into  an  efficiency  of  91.4%. 


Figure  15:  Relative  speedup  as  a  function  of  the  number  of  processors. 


6  Conclusions 

In  the  present  study,  we  have  analyzed  numerically  the  higher-order  adaptive  finite- 
element  discretization  of  the  Kohn-Sham  DFT  problem.  The  present  work  is  focussed 
towards  demonstrating  the  significant  computational  efficiency  in  electronic  structure  cal¬ 
culations  that  is  afforded  by  using  an  adaptive  higher-order  spectral  finite-element  dis¬ 
cretization  in  conjunction  with  appropriate  solution  strategies.  We  use  the  self-consistent 
field  formulation  of  the  Kohn-Sham  DFT  problem  as  our  starting  point.  In  order  to  aid 
our  investigation,  we  first  developed  estimates  for  the  discretization  error  in  the  ground- 
state  energy  in  terms  of  the  ground-state  electronic  fields  (wavefunctions  and  electrostatic 
potential)  and  characteristic  mesh-size.  These  error  estimates  and  the  a  priori  knowl¬ 
edge  of  the  asymptotic  solutions  of  far-held  electronic  fields  were  used  to  construct  mesh 
coarsening  rates  for  the  various  benchmark  problems  considered  in  this  work.  Since  the 
finite-element  discretization  of  the  Kohn-Sham  problem  results  in  a  generalized  eigenvalue 
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problem,  which  is  computationally  expensive  to  solve,  we  presented  an  approach  to  triv¬ 
ially  transform  this  into  a  standard  eigenvalue  problem  by  using  spectral  finite-elements 
in  conjunction  with  the  Gauss-Lobatto  quadrature  rules  that  results  in  a  diagonal  overlap 
matrix.  We  subsequently  examined  two  different  strategies  to  solve  the  Kohn-Sham  prob¬ 
lem:  (i)  explicit  computation  of  eigenvectors  at  every  self-consistent  Held  iteration;  (ii)  a 
Chebyshev  filtering  approach  that  directly  computes  the  occupied  eigenspace.  We  found 
that  the  Chebyshev  filtered  approach  presents  with  significant  computational  savings  for 
both  pseudopotential,  as  well  as,  all-electron  calculations. 

Using  the  derived  error  estimates  and  the  a  priori  knowledge  of  the  asymptotic  solu¬ 
tions  of  far-field  electronic  fields,  we  constructed  close  to  optimal  finite-element  meshes 
for  the  various  benchmark  problems,  which  include  all-electron  calculations  on  systems 
comprising  of  boron  atom,  methane  molecule  and  pseudopotential  calculations  on  barium 
cluster  and  bulk  calcium  crystal.  We  employed  the  Chebyshev  filtering  approach  on  the 
transformed  standard  eigenvalue  problem  in  our  numerical  investigations  to  study  the 
computational  efficiency  of  higher-order  finite-element  discretizations.  To  this  end,  we 
first  investigated  the  performance  of  higher-order  elements  by  studying  the  convergence 
rates  of  linear  tetrahedral  element  and  hexahedral  spectral-elements  up  to  sixth-order. 
In  all  the  benchmark  problems  considered,  we  observed  close  to  optimal  rates  of  conver¬ 
gence  for  the  finite-element  approximation  in  the  ground-state  energy.  Importantly,  we 
note  that  optimal  rates  of  convergence  were  obtained  for  all  orders  of  finite-element  ap¬ 
proximations,  considered  in  this  work,  even  for  all-electron  Kohn-Sham  DFT  calculations 
with  Coulomb-singular  potentials,  the  mathematical  analysis  of  which,  to  the  best  of  our 
knowledge,  is  an  open  question  to  date. 

We  further  investigated  the  computational  efficiency  afforded  by  the  use  of  higher- 
order  finite-elements  up  to  eighth-order  spectral-elements.  To  this  end,  we  used  the  mesh 
coarsening  rates  determined  from  the  proposed  mesh  adaption  scheme  and  studied  the 
CPU  time  required  to  solve  the  benchmark  problems.  Our  results  demonstrate  that 
significant  computational  savings  can  be  realized  by  using  higher-order  elements.  For 
instance,  a  staggering  1000— fold  savings  in  terms  of  CPU-time  are  realized  by  using 
sixth-order  hexahedral  spectral-element  in  comparison  to  linear  tetrahedral  element.  We 
also  note  that  the  point  of  diminishing  returns  in  terms  of  computational  efficiency  was 
determined  to  be  around  sixth-order  for  the  benchmark  systems  we  examined.  To  further 
assess  the  performance  of  higher-order  finite-elements,  we  extended  our  investigations  to 
study  large  materials  systems  and  compared  the  computational  CPU-time  with  commer¬ 
cially  available  plane-wave  and  Gaussian  basis  codes.  We  first  conducted  pseudopotential 
simulations  on  aluminium  clusters  containing  172  atoms  and  666  atoms  using  sixth-order 
spectral-element  in  our  implementation,  as  well  as,  the  plane-wave  basis  in  ABINIT 
package  solved  to  the  similar  relative  accuracy  in  the  ground-state  energy.  These  stud¬ 
ies  showed  that  the  computational  CPU-time  required  for  the  finite-element  simulations 
is  lesser  in  comparison  to  plane-wave  basis  sets  underscoring  the  fact  that  higher-order 
finite-elements  can  compete  with  plane-waves,  at  least  in  non-periodic  settings,  when 
employed  in  conjunction  with  efficient  solution  strategies.  Furthermore,  we  were  able 
to  compute  the  electronic  structure  of  an  aluminium  cluster  containing  1,688  atoms  by 
employing  the  sixth-order  spectral-element,  which  was  not  possible  using  ABINIT  due  to 
large  memory  requirements.  Next,  we  examined  the  computational  efficiency  in  the  case 
of  all-electron  calculations  on  a  graphene  sheet  containing  100  atoms.  The  all-electron 
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calculations  were  conducted  using  polarization  consistent  Gaussian  DFT  basis  sets  and 
the  fourth-order  spectral-element  basis,  and  we  observed  that  the  computational  time  for 
the  finite-element  basis  was  10— fold  greater  than  the  Gaussian  basis. 

The  prospect  of  using  higher-order  spectral  finite-elements  as  basis  functions,  in  con¬ 
junction  with  the  proposed  solution  strategies,  for  Kohn-Sham  DFT  electronic  struc¬ 
ture  calculations  is  indeed  very  promising.  While  finite-elements  have  the  advantages  of 
handling  complex  geometries  and  boundary  conditions  and  exhibit  good  scalability  on 
massively  parallel  computing  platforms,  their  use  has  been  limited  in  electronic  structure 
calculations  as  their  computational  efficiency  compared  unfavorably  to  plane-wave  and 
Gaussian  basis  functions.  The  present  study  shows  that  the  use  of  higher-order  discretiza¬ 
tions  can  alleviate  this  problem,  and  presents  a  useful  direction  for  electronic  structure 
calculations  using  finite-element  discretization.  Further,  the  computational  cost  in  the 
case  of  all-electron  calculations  can  be  further  reduced  by  enriching  the  finite-element 
shape  functions  with  single-atom  wavefunctions  using  the  partitions-of-unity  approach, 
and  is  currently  being  studied.  Last,  but  not  the  least,  the  implications  of  using  higher- 
order  spectral  finite-element  approximations  in  the  development  of  a  linear  scaling  Kohn- 
Sham  DFT  formulation  is  a  worthwhile  subject  for  future  investigation. 
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